Skip to main content

Scraping Wikipedia Homepage Information with Python

Wikipedia is an online encyclopedia created by people worldwide. πŸ“˜

In this lesson, we will learn how to collect specific information from a Wikipedia page using Python code.

Using the BeautifulSoup and requests libraries, you can extract the title and description from the Wikipedia homepage as shown below.

Step 1: Import Necessary Libraries​

Importing requests and BeautifulSoup libraries
import requests
from bs4 import BeautifulSoup

This code performs the following:

  • Uses the import keyword to load the requests library for HTTP communication

  • Uses the from keyword to load the bs4 package for web scraping and imports the BeautifulSoup class from the bs4 package

Step 2: Retrieve and Store HTML from the URL​

Use BeautifulSoup to retrieve and store the HTML of a webpage in a variable as follows.

Fetching HTML from Wikipedia homepage
# Wikipedia homepage URL
url = ""

# Fetch HTML from the URL using the requests library
response = requests.get(url)

# Set the encoding of the fetched HTML to UTF-8
response.encoding = 'utf-8'

# Store the fetched HTML in the soup variable
soup = BeautifulSoup(response.text, 'html.parser')

This code performs the following:

  • Stores the Wikipedia homepage URL in the url variable

  • Fetches HTML from the URL using requests.get(url)

  • Parses the fetched HTML with BeautifulSoup(response.text, 'html.parser') and stores the parsed result in the soup variable

Step 3: Extract Title and Description Information​

Extract desired information from the soup variable as shown below.

Extracting title and description from Wikipedia homepage
# Extract h1 (heading 1, title) from the webpage
h1_title = soup.find('h1').text

# Extract p (paragraph) tag from the webpage
p_description = soup.find('p').text

This code performs the following:

  • Finds the h1 tag in the soup variable using soup.find('h1').text to extract the title and stores it in the h1_title variable

  • Finds the p tag in the soup variable using soup.find('p').text to extract the description and stores it in the p_description variable

Finally, use the print function to display the extracted title and description from the URL.


Press the Run Code button on the right to see the scraping results. The first execution may take some time.

You can also change the url address (e.g., to fetch information from other web pages.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.