Description
In this course you will learn:
- how to leverage this popular processing engine to deliver effective and comprehensive insights into your data.
- how to analyze data in Spark using PySpark and Spark SQL, explores running machine learning algorithms using MLib.
- how to create a streaming analytics application using Spark Streaming, and more.
Syllabus :
- Welcome
- What you should know before watching this course
- Using the exercise files
1. Introducing Apache Spark
- Understanding Spark
- Origins of Spark
- Overview of Spark components
- Where Spark shines
- Overview of Databricks
- Introduction to notebooks and PySpark
- Chapter Quiz
2. Analyzing Data in Spark
- Understanding data interfaces
- Working with text files
- Loading CSV data into DataFrames
- Exploring data in DataFrames
- Saving your results
- Chapter Quiz
3. Using Spark SQL to Analyze Data
- Creating tables
- Querying data with Spark SQL
- Visualizing data in Databricks notebooks
- Chapter Quiz
4. Running Machine Learning Algorithms Using MLlib
- Introduction to machine learning with Spark
- Preparing data for machine learning
- Building a linear regression model
- Evaluating a linear regression model
- Visualizing a linear regression model
- Chapter Quiz
5. Real-Time Data Analysis with Spark Streaming
- Introduction to streaming analytics
- Streaming context setup
- Querying streaming data
- Chapter Quiz
6. Connecting BI Tools to Spark
- Setting up spark locally
- Connecting Jupyter notebooks to Spark
- Other connection options