Fundamentals of Web Crawling in HTML
HTML (HyperText Markup Language) defines the backbone of a web page and serves as a markup language for structuring content.
A markup language is a system for defining the structure and content of a document.
Basic Structure of HTML
An HTML document consists of tags enclosed in angle brackets (< >).
A tag typically includes a start tag (<tag>) and an end tag (</tag>), with content placed between these tags.
For example, the <h1> tag represents a heading, and is closed with a /, like this: </h1>.
An HTML document consists of elements, which are units made up of tags and content.
For instance, <h1>Title</h1> is an element where the text "Title" is enclosed within an <h1> tag.
The basic structure of an HTML document is as follows:
<!DOCTYPE html>
<html>
<!-- The section containing the document's metadata -->
<head>
<title>Page Title</title>
</head>
<!-- The section containing the content of the web page -->
<body>
<h1>Heading Element</h1>
<p>Paragraph Element</p>
</body>
</html>
-
<!DOCTYPE html>: Defines the version of the HTML document, indicating that it is an HTML5 document. -
<html>: The root element of the HTML document that includes all HTML elements. -
<head>: Contains the document's metadata (title, description, styles, etc.). -
<title>: Defines the page title displayed on the browser tab. -
<body>: Encloses the content of the web page. -
<h1>,<p>, etc.: Various HTML elements represent different types of content such as headings, paragraphs, etc.
Key HTML Tags
HTML includes a variety of tags, each representing specific types of content:
-
<h1>to<h6>: Heading tags, where<h1>is the largest and<h6>is the smallest. -
<p>: Represents a paragraph. -
<a>: Creates a hyperlink that links to another web page. -
<img>: Embeds an image in the document. -
<ul>,<ol>,<li>: Define unordered lists(<ul>)and ordered lists(<ol>), with<li>representing list items.
For more detailed information about HTML, check out the Introduction to HTML Course.
Practice
Follow the sections highlighted in the code to fill in the blanks.