Crawling Latest Trending Articles from Wikipedia

Utilize the find_all method of BeautifulSoup to crawl significant events from Wikipedia's Current Events section.

Example Code Explanation

Extracting the First 10 Trending Article Titles
import requests
from bs4 import BeautifulSoup

def crawl_wikipedia_current_events_first_10_titles():
    url = "https://en.wikipedia.org/wiki/Portal:Current_events"

    response = requests.get(url)
    if response.status_code != 200:
        print("Response failed", response.status_code)
        return None

    soup = BeautifulSoup(response.content, "html.parser")

    # Locate the div tag containing the contents of the Current Events section
    current_events_section = soup.find("div", {"id": "mw-content-text"})

    # Find all li tags within the div tag
    list_items = current_events_section.find_all("li") if current_events_section else []

    # Extract text inside li tags and store them in a list
    titles = [item.get_text(strip=True) for item in list_items[:10]]

    return titles

Requesting a Web Page: Use requests.get(url) to request the content of a specific URL.
Checking Response Status: Verify whether the request was successful by inspecting response.status_code.
Creating a BeautifulSoup Object and Parsing Data: Use BeautifulSoup(response.content, "html.parser") to parse the HTML content.
Extracting Data from a Specific Section: Locate all li tags within a particular section of the webpage (e.g., 'Current Events'), and extract the first 10 entries.

Practice Exercises

Use the above code to extract the latest event titles from Wikipedia's 'Current Events' section.
Experiment with targeting different webpages and sections to practice data extraction techniques.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Example Code Explanation​

Practice Exercises​

Want to learn more?

Example Code Explanation

Practice Exercises