Description
In this course, you'll learn :
- Spark from the ground up, starting with its history before creating a Wikipedia analysis application as one of the means for learning a wide scope of its core API. That core knowledge will make it easier to look into Spark's other libraries, such as the streaming and SQL APIs.
- Finally, you'll learn how to avoid a few commonly encountered rough edges of Spark. You will leave this course with a tool belt capable of creating your own performance-maximized Spark application.
Syllabus :
1. Spark Core: Part 1
- Spark Appification
- What Is an RDD?
- Loading Data
- Lambdas
- Transforming Data
- More Transformations
- Actions and the Associative Property
- Acting on Data
- Persistence
2. Spark Core: Part 2
- Implicit Conversions
- Key Value Methods
- Caching Data
- Accumulating Data
- Java in Spark
3. Distribution and Instrumentation
- Spark Submit
- Cluster Management
- Standalone Cluster Scripts
- AWS Setup
- Spark on Yarn in EMR
- Spark UI
4. Spark Libraries
- Spark SQL
- Spark SQL Demo - The SQL Side
- Streaming
- Machine Learning
- GraphX
5. Optimizations and the Future
- Closures
- Broadcasting
- Optimizing Partitioning
- Spark's Future