Description
In this course, you will :
- Understand the distinction between a human web surfer and a web scraper introduces the Chrome developer tools and demonstrates how to use them to examine network calls
- how to use pip to install Scrapy and write some "Hello, World" code to scrape a simple web page
- Describes how to use the Scrapy LinkExtractor to find internal links on a web page, and then shows how to configure Scrapy and the ItemPipeline to write data to various file formats.
- explains how APIs work and how they can be used to directly retrieve data
- explore headers and cookies before moving on to browser automation and how to integrate Selenium with Scrapy.
Syllabus :
1. Basic Web Scraping
- What is web scraping?
- How the internet works: A brief summary
- Hello world with Scrapy
2. Learning to Crawl
- Crawling a website
- Recording data
- Scrapy settings file
- Structuring your scrapers for extensibility/reusability
3. Advanced Techniques
- Submitting a form
- Finding and using hidden APIs
- Sitemaps and robots.txt
4. Acting Human
- Logging in
- Browser automation with Selenium
- Interacting with a page