This page servers as the primary guide for my students who want to learn Python.
\\n\\t{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# What Is Web Scraping?\n", "\n", "The automated gathering of data from the internet is nearly as old as the internet itself. Although web scraping is not a new term, in years past the practice has been more commonly known as screen scraping, data mining, web harvesting, or similar variations. General consensus today seems to favor web scraping, so that is the term I use throughout the book, although I also refer to programs that specifically traverse multiple pages as web crawlers or refer to the web scraping programs themselves as bots.\n", "\n", "In theory, web scraping is the practice of gathering data through any means other than a program interacting with an API (or, obviously, through a human using a web browser). This is most commonly accomplished by writing an automated program that queries a web server, requests data (usually in the form of HTML and other files that compose web pages), and then parses that data to extract needed information.\n", "\n", "In practice, web scraping encompasses a wide variety of programming techniques and technologies, such as data analysis, natural language parsing, and information security. Because the scope of the field is so broad, this book covers the fundamental basics of web scraping and crawling in Part I and delves into advanced topics in Part II. I suggest that all readers carefully study the first part and delve into the more specific in the second part as needed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Your First Web Scraper\n", "\n", "## Let's try the toy first" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'\\n\\n
\\n \\n \\n \\n \\nPersonal Site
\\n\\t\\t\\t\\t\\t\\t\\t\\t\\tThis page servers as the primary guide for my students who want to learn Python.
\\n\\tPersonal Site
\n", "This page servers as the primary guide for my students who want to learn Python.
\n", "