# Web scraping with Selenium web driver¶

Since Selenium WebDriver is created for browser automation, it can be easily used for scraping data from the web. Selenium is to select and navigate the components of a website that are non-static and need to be clicked or chosen from drop-down menus.

If there is any content on the page rendered by javascript then Selenium webdriver wait for the entire page to load before crwaling whereas other libs like BeautifulSoup,Scrapy and Requests works only on static pages.

Any browsyer actions can be done with the help of Selenium webdriver, if there is any content on the page displayed by on button click or Scrolling or Page Navigation.

#### Pros of using WebDriver¶

• WebDriver can simulate a real user working with a browser
• WebDriver can scrape a web site using a specific browser
• WebDriver can scrape complicated web pages with dynamic content
• WebDriver is able to take screenshots of the webpage

#### Cons of using WebDriver¶

• The program becomes quite large
• The scraping process is slower
• The browser generates a bigger network traffic
• The scraping can be detected by such simple means as Google Analytics

## Web Scraping Bing with Selenium Firefox driver¶

Let’s now load the main bing search page and makes a query to look for “feng li”: You need to install selenium module for Python. You also need geckodriver and place it in a directory where \$PATH can find. You could download it from https://github.com/mozilla/geckodriver/releases .

In [11]:
from selenium import webdriver
driver = webdriver.Firefox()

In [12]:
driver.get("https://www.bing.com/")

In [13]:
driver.find_element_by_id("sb_form_q").send_keys("feng li")

In [14]:
driver.find_element_by_id("sb_form_go").click()

In [23]:
#driver.close()
#driver.quit()


## Web Scraping Baidu with headless web driver¶

To use a headless firefox requires a bit of configuration.

In [32]:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
driver = webdriver.Firefox(firefox_options=options)

Firefox Headless Browser Invoked

In [33]:
driver.get("https://www.baidu.com/")
driver.find_element_by_id("kw").send_keys("李丰 中央财经大学")

In [34]:
driver.find_element_by_id("su").click()
results = driver.find_elements_by_xpath('//div[@srcid="1599"]/h3/a')

In [35]:
for result in results:
print(result.text)

中央财经大学统计与数学学院导师李丰简介_考研派

...届全国高校经管类实验教学案例大赛中取得佳绩 - 中央财经大学...
COS访谈第22期:李丰老师-来自微信公众号统计之都-wx.abbao.cn
...级博士生李丰羽在《Energy Policy》发表论文_中央财经大学金融...


In [36]:
driver.close()


# Lab¶

Use selenium to implement the case we studied with BeautifulSoup in L2.