How to Crawl Static Stock Data
If stock data is provided in a static format, web crawling can be performed using just the requests
and BeautifulSoup
libraries, as shown in the practice example.
In this lesson, we will learn how to crawl data within a virtual stock data table like the one found at this link.
Company Name | Current Price | Change | Rate of Change |
---|---|---|---|
Company A | 1064 | 26 | 2.44% |
Company B | 1458 | -35 | -2.40% |
Company C | 1991 | 49 | 2.46% |
Company D | 2595 | 22 | 0.85% |
Company E | 3074 | -36 | -1.17% |
Company F | 598 | 2 | 0.33% |
The example table data presented is static, which means it does not change unless refreshed.
Code Explanation
Similarly to what was learned earlier, you can perform crawling using the requests and BeautifulSoup libraries, but it is essential to set response.encoding = "utf-8"
to correctly retrieve any characters, such as Unicode text.
We will also explore more advanced usage of the find method like find("td", {"class": "company-cell"})
and find_all("tr")
.
Step 1
response = requests.get(url)
response.encoding = "utf-8"
html_content = response.text
requests.get(url)
: Sends a request to the specified URL and retrieves the webpage data.response.encoding = "utf-8"
: Sets the response encoding to UTF-8 to prevent character issues like broken Unicode text.html_content = response.text
: Stores the received HTML content in text format.
Step 2
soup = BeautifulSoup(html_content, "html.parser")
- Creates a
BeautifulSoup
object that parses the HTML content, enabling easy access to HTML elements.
Step 3
stock_table = soup.find("table", {"id": "stock-table"})
- Uses the
soup.find()
method to locate the table element containing stock data (<table id="stock-table">
) within the HTML.
Step 4
for row in stock_table.find("tbody").find_all("tr"):
stock_table.find("tbody").find_all("tr")
: Iterates over every row (<tr>
) in the table’s<tbody>
section.
Extracts the following data from each row:
- Company name: Text from the
<td>
element withclass="company-cell"
. - Current price: Text from the
<td>
element withclass="current-price-cell"
. - Price change: Text from the
<td>
element withclass="diff-cell"
. - Rate of change: Text from the
<td>
element withclass="fluct-cell"
.
Step 5
print(f"{company_name}: Current Price {current_price}, Change {price_change}, Rate of Change {change_percentage}")
- Formats and prints the extracted data.
Practice
Click the Run Code
button on the right side of the screen to see the crawling results or tweak the code as needed!
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.