Selenium
web driver¶Since Selenium WebDriver is created for browser automation, it can be easily used for scraping data from the web. Selenium is to select and navigate the components of a website that are non-static and need to be clicked or chosen from drop-down menus.
If there is any content on the page rendered by javascript then Selenium webdriver wait for the entire page to load before crwaling whereas other libs like BeautifulSoup,Scrapy and Requests works only on static pages.
Any browsyer actions can be done with the help of Selenium webdriver, if there is any content on the page displayed by on button click or Scrolling or Page Navigation.
Selenium
Firefox driver¶Let’s now load the main bing search page and makes a query to look for “feng li”: You need to install selenium
module for Python. You also need geckodriver
and place it in a directory where $PATH
can find. You could download it from https://github.com/mozilla/geckodriver/releases .
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.bing.com/")
driver.find_element_by_id("sb_form_q").send_keys("feng li")
driver.find_element_by_id("sb_form_go").click()
#driver.close()
#driver.quit()
To use a headless firefox requires a bit of configuration.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=options)
print("Firefox Headless Browser Invoked")
driver.get("https://www.baidu.com/")
driver.find_element_by_id("kw").send_keys("李丰 中央财经大学")
driver.find_element_by_id("su").click()
results = driver.find_elements_by_xpath('//div[@srcid="1599"]/h3/a')
for result in results:
print(result.text)
driver.close()
Use selenium
to implement the case we studied with BeautifulSoup
in L2.