Key Methods and Usage of BeautifulSoup
In this lesson, we will look at the key methods of BeautifulSoup
and how to use them with some simple examples.
Finding a Specific Element with find
To find a specific element on a web page, you can use the find()
method.
This method returns the first element
that meets the criteria.
from bs4 import BeautifulSoup
html_doc = """
<html><body>
<h1>Hello</h1>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</body></html>
"""
# Parse the HTML
soup = BeautifulSoup(html_doc, 'html.parser')
# Find the h1 tag
h1_tag = soup.find('h1')
# Output: Hello
print(h1_tag.text)
In the example above, it finds the h1
tag and prints its content.
find()
always returns only the first matching element, so if there are multiple elements, only the first one is returned.
Finding Multiple Elements with find_all
If you want to find all elements that meet the criteria, use the find_all()
method.
This method returns the results as a list, allowing you to handle multiple elements at once.
from bs4 import BeautifulSoup
html_doc = """
<html><body>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3</p>
</body></html>
"""
# Parse the HTML
soup = BeautifulSoup(html_doc, 'html.parser')
# Find all p tags
p_tags = soup.find_all('p')
# Print all p tags
for p in p_tags:
# Output: Paragraph 1, Paragraph 2, Paragraph 3
print(p.text)
This code finds and prints all p
tags in the string held by the html_doc
variable.
The p_tags
variable holds the values of the p tags in a list like ['Paragraph 1', 'Paragraph 2', 'Paragraph 3']
.
Thus, find_all()
is useful when you want to find multiple elements at once.
Finding Elements Using CSS Selectors with select
To select a specific element using CSS selectors, use select()
.
from bs4 import BeautifulSoup
html_doc = """
<html><body>
<p>Paragraph 1</p>
<div class="content">
<p>Paragraph 2</p>
<p>Paragraph 3</p>
</div>
</body></html>
"""
# Parse the HTML
soup = BeautifulSoup(html_doc, 'html.parser')
# Find all p tags within .content class
content_p_tags = soup.select('.content p')
for p in content_p_tags:
# Output: Paragraph 2, Paragraph 3
print(p.text)
This code selects and prints all p
tags within the .content
class.
Selecting the First Element with select_one
The select_one()
method is similar to select()
, but it returns only the first element that meets the criteria.
from bs4 import BeautifulSoup
html_doc = """
<html><body>
<div class="content">
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>
</body></html>
"""
# Parse the HTML
soup = BeautifulSoup(html_doc, 'html.parser')
# Find the first p tag within .content class
first_p_tag = soup.select_one('.content p')
# Output: Paragraph 1
print(first_p_tag.text)
In the above example, it finds and prints the first p
tag within the .content
class.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.