Description
In this course, you will :
- Discover the fundamentals of web scraping.
- Scrapy is a website scraping tool.
- Learn about Xpath and CSS selectors.
- Construct a complete Spider from A to Z.
- The extracted data should be saved in MongoDb and SQLite3.
- Splash and Selenium are used to scrape JavaScript websites.
- Create a CrawlSpider.
- Recognize the Crawling behaviour.
- Create your own Middleware.
- Best practises for web scraping
- When scraping websites, avoid getting banned.
- Avoid using cloudflare.
- Scrape APIs
- Scrape websites with infinite scrolling.
- Utilizing Cookies.
- Locally and in the cloud, deploy spiders.
- Run spiders at regular intervals.
- Prevent the storage of duplicated data.
- Create datasets.
- Scrapy can be used to log into websites.
- Download images and files using Scrapy
Syllabus :
- Scrapy Fundamentals
- XPath expressions & CSS Selectors
- Project 1 Spiders from A to Z
- Building Datasets
- Project 2 Dealing with Multiple pages
- Debugging spiders
- Let's take a break !
- Project 3 Build Crawlers using Scrapy
- Splash crash course
- Project 4 Scraping JavaScript websites using Splash
- Project 5 Scraping JavaScript websites using Selenium
- Working with Pipelines
- Scraping APIs (NEW)
- Log in to websites (NEW)
- Project 6 Bypass Cloudflare