Skip to main content

Crawling Stars and Forks Count from a Repository

In this lesson, we'll delve into a more structured logic to crawl and display the Stars (Likes) and Forks (Project Clones) count from a repository.

Step 1

Fetch HTML from Web Page
response = requests.get(url)
html_content = response.text
  • requests.get(url): A function that fetches data from a web page at the given URL. In this context, it targets the GitHub repository page of Django.
  • response.text: Extracts the HTML content as a string from the response obtained by requests.get.

Step 2

Parse HTML
soup = BeautifulSoup(html_content, 'html.parser')
  • BeautifulSoup(html_content, 'html.parser'): Utilizes BeautifulSoup to parse html_content, enabling easy access and manipulation of HTML elements.

Step 3

Locate Stars and Forks Count
ids_to_find = ['repo-stars-counter-star', 'repo-network-counter']
  • This list holds the IDs of HTML elements that display the stars and forks count. These IDs are used to locate the information on the webpage.

Step 4

Extract Information
for id_value in ids_to_find:
element_content = soup.find(id=id_value)
found_contents[id_value] = element_content.get_text() if element_content else "No content"
  • soup.find(id=id_value): Finds the HTML element with the specified ID in the parsed HTML content.
  • element_content.get_text(): Extracts the text content from the found element. If the element doesn't exist, "No content" is returned.

Step 5

for id_value, content in found_contents.items():
print(f"ID '{id_value}': {content}")
  • found_contents.items(): Iterates through the found content, printing each ID and its corresponding text content, allowing users to see the stars and forks count.

Practical Exercise

  • Execute the code above with a different repository URL on GitHub.

  • Practice extracting various data by using different IDs or classes.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.