Description
In this course, you will :
- Create big data streaming pipelines with Spark using Python
- Run analytics on live Tweet data from Twitter
- Integrate Spark Streaming with tools like Apache Kafka, used by Fortune 500 companies
- Work with new features of the most recent version of Spark: 2.3
Syllabus :
1. Pyspark Basics
- What are Discretized Streams?
- How to Create Discretized Streams
- Transformations on DStreams
- Transformation Operation
- Window Operations
- Window
- countByWindow
- reduceByKeyAndWindow
- countByValueAndWindow
- Output Operations on DStreams
- forEachRDD
- SQL Operations
- Reviewing the Basics
2. Advanced Spark Concepts
- Join Operations
- Stateful Transformations
- Checkpointing
- Accumulators
- Fault Tolerance
3. PySpark Streaming at Scale
- Performance Tuning
- PySpark Streaming with Apache Kafka
- Integration with Kafka Text Lecture
- PySpark Streaming with Amazon Kinesis
- Integration with Kinesis Text Lecture
4. Structured Streaming
- Introduction to Structured Streaming
- Operations on Streaming Dataframes and DataSets
- Window Operations
- Handling Late Data and Watermarking