Scraping Wikipedia Homepage Information with Python

Wikipedia is an online encyclopedia created by people worldwide. 📘

In this lesson, we will learn how to collect specific information from a Wikipedia page using Python code.

Using the BeautifulSoup and requests libraries, you can extract the title and description from the Wikipedia homepage as shown below.

Step 1: Import Necessary Libraries

Importing requests and BeautifulSoup libraries
import requests
from bs4 import BeautifulSoup

This code performs the following:

Uses the import keyword to load the requests library for HTTP communication
Uses the from keyword to load the bs4 package for web scraping and imports the BeautifulSoup class from the bs4 package

Step 2: Retrieve and Store HTML from the URL

Use BeautifulSoup to retrieve and store the HTML of a webpage in a variable as follows.

Fetching HTML from Wikipedia homepage
# Wikipedia homepage URL
url = "https://www.wikipedia.org"

# Fetch HTML from the URL using the requests library
response = requests.get(url)

# Set the encoding of the fetched HTML to UTF-8
response.encoding = 'utf-8'

# Store the fetched HTML in the soup variable
soup = BeautifulSoup(response.text, 'html.parser')

This code performs the following:

Stores the Wikipedia homepage URL in the url variable
Fetches HTML from the URL using requests.get(url)
Parses the fetched HTML with BeautifulSoup(response.text, 'html.parser') and stores the parsed result in the soup variable

Step 3: Extract Title and Description Information

Extract desired information from the soup variable as shown below.

Extracting title and description from Wikipedia homepage
# Extract h1 (heading 1, title) from the webpage
h1_title = soup.find('h1').text

# Extract p (paragraph) tag from the webpage
p_description = soup.find('p').text

This code performs the following:

Finds the h1 tag in the soup variable using soup.find('h1').text to extract the title and stores it in the h1_title variable
Finds the p tag in the soup variable using soup.find('p').text to extract the description and stores it in the p_description variable

Finally, use the print function to display the extracted title and description from the URL.

Practice

Press the Run Code button on the right to see the scraping results. The first execution may take some time.

You can also change the url address (e.g., https://www.codefriends.net) to fetch information from other web pages.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Step 1: Import Necessary Libraries​

Step 2: Retrieve and Store HTML from the URL​

Step 3: Extract Title and Description Information​

Practice​

Want to learn more?

Step 1: Import Necessary Libraries

Step 2: Retrieve and Store HTML from the URL

Step 3: Extract Title and Description Information

Practice