Want to master your topic faster? Let AI build your personalized course

📚 Smarter courses, 🔍 adaptive quizzes, 🎓 real certificates.

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Apache Spark 3 - Spark Programming in Python for Beginners

Apache Spark 3 - Spark Programming in Python for Beginners

Data Engineering using Spark Structured API

31.9K

total enrollments

4.6

( 5.6K )

Total ratings

Discount Offer

Go to Course SAVE

Course Overview
Reviews

Description

In this course, you will learn :

Spark Architecture and the Apache Spark Foundation.
Spark Data Engineering and Processing.
Using Data Sources and Sinks.
Using Data Frames and Spark SQL.
PyCharm IDE is being used for Spark development and debugging.
Unit testing, application log management, and cluster deployment are all responsibilities.

Syllabus :

1. Understanding Big Data and Data Lake

What is Big Data and How it Started
Hadoop Architecture, History, and Evolution
Introducing Apache Spark and Databricks Cloud

2. Installing and Using Apache Spark

Spark Development Environments
Setup your Databricks Community Cloud Environment
Introduction to Databricks Workspace
Create your First Spark Application in Databricks Cloud
Setup your Local Development IDE
Mac Users - Setup your Local Development IDE
Create your First Spark Application using IDE

3. Spark Execution Model and Architecture

Execution Methods - How to Run Spark Programs?
Spark Distributed Processing Model - How your program runs?
Spark Execution Modes and Cluster Managers
Summarizing Spark Execution Models - When to use What?
Working with PySpark Shell - Demo
Installing Multi-Node Spark Cluster - Demo
Working with Notebooks in Cluster - Demo
Working with Spark Submit - Demo

4. Spark Programming Model and Developer Experience

Creating Spark Project Build Configuration
Configuring Spark Project Application Logs
Check your knowledge
Creating Spark Session
Check your knowledge
Configuring Spark Session
Data Frame Introduction
Data Frame Partitions and Executors
Spark Transformations and Actions
Spark Jobs Stages and Task
Understanding your Execution Plan
Unit Testing Spark Application
Rounding off Summary

5. Spark Structured API Foundation

Introduction to Spark APIs
Introduction to Spark RDD API
Working with Spark SQL
Spark SQL Engine and Catalyst Optimizer

6. Spark Data Sources and Sinks

Spark Data Sources and Sinks
Spark DataFrameReader API
Reading CSV, JSON and Parquet files
Creating Spark DataFrame Schema
Spark DataFrameWriter API
Writing Your Data and Managing Layout
Spark Databases and Tables
Working with Spark SQL Tables

7. Spark Dataframe and Dataset Transformations

Introduction to Data Transformation
Working with Dataframe Rows
DataFrame Rows and Unit Testing
Dataframe Rows and Unstructured data
Working with Dataframe Columns
Creating and Using UDF
Misc Transformations

8. Aggregations in Apache Spark

Aggregating Dataframes
Grouping Aggregations
Windowing Aggregations

9. Spark Dataframe Joins

Dataframe Joins and column name ambiguity
Outer Joins in Dataframe
Internals of Spark Join and shuffle
Optimizing your joins
Implementing Bucket Joins

10. Archived - Apache Spark Introduction

Big Data History and Primer
Understanding the Data Lake Landscape
What is Apache Spark - An Introduction and Overview

11. Archived - Installing and Using Apache Spark

Spark Development Environments
Mac Users - Apache Spark in Local Mode Command Line REPL
Windows Users - Apache Spark in Local Mode Command Line REPL
Mac Users - Apache Spark in the IDE - PyCharm
Windows Users - Apache Spark in the IDE - PyCharm
Apache Spark in Cloud - Databricks Community and Notebooks

Similar Courses

Reviews

No Reviews Available yet

Be the first to write a review

Course Features

Certificate on completion
Udemy
English
Beginner
Development ,Apache Spark

Enrollment options

Course Material
Certificate on completion
30 days Refund (refund policy)
Lifetime Access
Instructor direct message
Instructor Q&A