Description
In this course, you will :
- Be able to scrape jobs from a Craigslist page.
- Discover how to use Request.
- Discover how to use NightmareJS.
- Discover how to use Puppeteer.
- Learn how to scrape elements that do not have identifiable classes or ids.
- Discover how to save scraping data as a CSV file.
- Discover how to save scraped data to MongoDb.
- Discover how to scrape Facebook with only Request!
- Learn how to reverse engineer websites and discover hidden APIs!
- Learn about the various scraping technologies and when to use them.
- Discover how to scrape websites with authentication.
- Discover how to scrape HTML tables with Request/Cheerio.
Syllabus :
- What you should ALWAYS check before even writing a web scraper!
- Intro to CSS selectors and tools we use for scraping
- Scraping HTML tables with Request/Cheerio
- Scraping software jobs on Craigslist using Puppeteer
- Web Scraping Craigslist Jobs using Nodejs Request
- What to do if you're blocked?
- Building a web scraper the TDD way
- Exporting web scraping results to CSV
- Handling Network Problems
- Robots.txt parsing
- Scraping Sites with Pagination
- Scraping Sites with Authentication
- Scraping a website with Cookie/Session authentication and CSRF tokens
- Scraping Nordstrom.com - how to find a secret API and avoid building a scraper!
- Scraping Imdb using NightmareJs
- Scraping AirBnb using Puppeteer
- Architecture for web scraper with an API
- Saving scraping data to MongoDB
- Deploying a periodic scraper to production
- Deploying Puppeteer web scraper to Heroku
- Scraping a infinite scrolling page (Facebook, Instagram, Pinterest etc.)
- SECRET BACKDOOR to Scraping Facebook without JavaScript enabled!