Description
In this course, you will learn:
- how to build big data pipelines around Apache Spark.
- how to combine multiple big data technologies.
- how to build big data pipelines around Apache Spark.
- how to make Apache Spark work with other big data technologies.
- how to integrate it with Spark for real-time streaming.
- how to use the various technologies to construct an end-to-end project that solves a real-world business problem.
Syllabus :
- Welcome
- What you should know
- Exercise files
- Set up the environment
1. Data Engineering Overview
- What is data engineering?
- Stages of data engineering
- Data engineering challenges with big data
- Spark and Kafka for data engineering
- Chapter Quiz
2. Moving Data with Kafka
- Use Kafka connectors
- Code: Read to a file source
- Code: Write to a HDFS sink
- Code: Read for a JDBC source
- Code: Write to a Spark sink
- Chapter Quiz
3. Spark High-Performance Processing
- Data engineering with Spark
- How Spark works
- Optimize for lazy evaluation
- Work with dependencies
- Complex accumulators
- Chapter Quiz
4. Use Case Project
- Problem statement
- Solution overview
- Process US sales data
- Process EU sales data
- Process web hits data
- Process tweet data
- Scale the solution
- Chapter Quiz