Fundamentals of Web Crawling in HTML
HTML (HyperText Markup Language)
defines the backbone of a web page and serves as a markup language for structuring content.
A markup language is a system for defining the structure and content of a document.
Basic Structure of HTML
An HTML document consists of tags enclosed in angle brackets (< >
).
A tag typically includes a start tag (<tag>
) and an end tag (</tag>
), with content placed between these tags.
For example, the <h1>
tag represents a heading, and is closed with a /
, like this: </h1>
.
An HTML document consists of elements, which are units made up of tags and content.
For instance, <h1>Title</h1>
is an element where the text "Title" is enclosed within an <h1>
tag.
The basic structure of an HTML document is as follows:
<!DOCTYPE html>
<html>
<!-- The section containing the document's metadata -->
<head>
<title>Page Title</title>
</head>
<!-- The section containing the content of the web page -->
<body>
<h1>Heading Element</h1>
<p>Paragraph Element</p>
</body>
</html>
-
<!DOCTYPE html>
: Defines the version of the HTML document, indicating that it is an HTML5 document. -
<html>
: The root element of the HTML document that includes all HTML elements. -
<head>
: Contains the document's metadata (title, description, styles, etc.). -
<title>
: Defines the page title displayed on the browser tab. -
<body>
: Encloses the content of the web page. -
<h1>
,<p>
, etc.: Various HTML elements represent different types of content such as headings, paragraphs, etc.
Key HTML Tags
HTML includes a variety of tags, each representing specific types of content:
-
<h1>
to<h6>
: Heading tags, where<h1>
is the largest and<h6>
is the smallest. -
<p>
: Represents a paragraph. -
<a>
: Creates a hyperlink that links to another web page. -
<img>
: Embeds an image in the document. -
<ul>
,<ol>
,<li>
: Define unordered lists(<ul>)
and ordered lists(<ol>)
, with<li>
representing list items.
For more detailed information about HTML, check out the Introduction to HTML Course.
Practice
Follow the sections highlighted in the code to fill in the blanks.