In this course, you will learn :
- Learn how to use the PySpark package to implement distributed data management and machine learning in Spark.
- You'll discover how Spark manages data and how to read and write tables in Python.
- Learn about the pyspark.sql module, which allows you to run optimised data queries in your Spark session.
- PySpark includes cutting-edge machine learning routines as well as utilities for creating full machine learning pipelines.
- You'll use what you've learned to build a model that forecasts which flights will be delayed.
- Getting to know PySpark
- Manipulating data
- Getting started with machine learning pipelines
- Model tuning and selection