HTML Basics for Web Crawling
HTML (HyperText Markup Language)
defines the skeleton of a web page, serving as a markup language used to construct web pages.
A markup language refers to a language that defines the structure and content of a document.
Basic HTML Structure
An HTML document consists of tags enclosed in angle brackets (< >
).
A tag includes a start tag (<tag>
) and an end tag (</tag>
), with content placed between these tags.
For example, the <h1>
tag represents a heading, and it is closed with a /
, as in </h1>
.
The unit of an HTML document composed of tags and content is referred to as an element.
For instance, <h1>Title</h1>
is an element containing the content "Title" wrapped in an <h1>
tag.
The basic structure of an HTML document is as follows:
<!DOCTYPE html>
<html>
<!-- The section containing the document's metadata -->
<head>
<title>Page Title</title>
</head>
<!-- The section containing the content of the web page -->
<body>
<h1>Heading Element</h1>
<p>Paragraph Element</p>
</body>
</html>
-
<!DOCTYPE html>
: Defines the version of the HTML document, indicating that it is an HTML5 document. -
<html>
: The root element of the HTML document that includes all HTML elements. -
<head>
: Contains the document's metadata (title, description, styles, etc.). -
<title>
: Defines the page title displayed on the browser tab. -
<body>
: Encloses the content of the web page. -
<h1>
,<p>
, etc.: Various HTML elements represent content such as headings, paragraphs, etc.
Key HTML Tags
HTML includes a variety of tags, each representing specific types of content:
-
<h1>
to<h6>
: Tags for headings, with<h1>
representing the largest heading. -
<p>
: The tag used to denote a paragraph. -
<a>
: A tag used to create hyperlinks (links to other web pages). -
<img>
: A tag used to insert an image. -
<ul>
,<ol>
,<li>
: Tags used to create unordered or ordered lists.
For more detailed information about HTML, refer to the Introduction to HTML Course.
Practice
Follow the sections highlighted in the code to fill in the blanks.