Web Scraping U.S. Stock Indices with Selenium
In the previous lesson, we introduced the requests
and BeautifulSoup
libraries, which can be used to extract desired data by fetching the HTML code of a specific web page.
However, if a web page is dynamically generated, meaning the content changes based on user interactions, simply using requests
and BeautifulSoup
is not sufficient to extract the desired data.
Modern websites often receive constantly changing data from the server and display it to the user; such web pages are called dynamic web pages
.
Because requests
and BeautifulSoup
cannot handle dynamic data received from the server, we need another method to extract dynamic data.
In such cases, we use the
Selenium
library to scrape data from dynamic web pages.
Introduction to the Selenium Library
Selenium
is a library used to automate and test web pages.
Since it can directly control a web browser, it can perform tasks such as scraping dynamic data or clicking on and entering data into specific elements on a web page.
Practical Example: Scraping U.S. Stock Indices with Selenium
In this practical example, we will introduce how to use Selenium to scrape real-time U.S. stock indices.
The code used in the exercise will scrape real-time U.S. stock indices from the Yahoo Finance
website.
# Launch the Chrome web driver to open a browser window
driver = webdriver.Chrome()
# Navigate to the 'Markets' page on Yahoo Finance
driver.get('https://finance.yahoo.com/markets/')
# Wait until the page is fully loaded (maximum wait time of 10 seconds)
wait = WebDriverWait(driver, 10)
...(truncated)...
More details about Selenium can be found in Chapter 3 of the course Essential Knowledge for Work Automation, which will expedite your workflow.
Click the green ▶︎ Run
button in the code editor to check real-time U.S. stock indices!