Want to master your topic faster? Let AI build your personalized course

📚 Smarter courses, 🔍 adaptive quizzes, 🎓 real certificates.

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Mastering Big Data with Apache Spark and Java

Mastering Big Data with Apache Spark and Java

This course provides a thorough introduction to the Spark Java API. Experienced Java developers will apply theory to Apache Spark and big data practise by using object-oriented programming (OOP) principles.

Free Trial

Go to Course SAVE

Course Overview
Reviews

Description

In this course, you will :

Learn the fundamentals of Apache Spark and an overview of its components.
Learn Advanced Transformations and how to use Spark SQL, Spark's powerful library.
Get hands-on experience with examples, coding, and recipes.
Using Spark, create a big data batch application with foundations in both design patterns and good programming practises.

Syllabus :

1. Spark Introduction and Basics

Spark Fundamentals
Components and Architecture
Spark and Big Data
Spark's Java Main Abstraction: The DataFrame

2. Getting Started with Spark

Running the First Spark Program
Spark Maven Based Projects
Enriching the Basic DataFrame Program
Deep Dive: Transformations and Data Storage

3. DataFrame Basic Operations

Working with DataFrame's Schemas
Dataset: a DataFrame of POJOs
Transformations and Actions
Transformations (I): Map and Filter
Actions (I): Count, Take, and Collect
Deep Dive: Internals of Spark Execution
Transformations (II): FlatMap and Distinct
Actions (II): Reduce and Aggregate Functions: Max, Min, and Mean

4. DataFrame Advanced Operations

Data Partitioning and Shuffling
The groupBy and groupByKey methods
Joins
Sort and OrderBy
Union, UnionByName, and DropDuplicates
Accumulators and Broadcast Variables
UDFs: User-defined Functions

5. Spark SQL and Other Functionalities

Spark SQL Goodness
Schema Manipulation
How to Ingest Files
Ingesting Databases
Exporting Information
Serialization: Working through the Wire

6. Building a Big Data Batch Application

The Application Architecture Ecosystem
Driver Program Design and Project Structure
Driver Program and Job Implementation
Ingestion Job
Batch Pipelines and Other Types of Jobs
Testing and Spark

7. Deployment and Cluster Execution

Local and Cluster-based Execution
Deploying and Running a Spark Application

8. Monitoring and Performance Fundamentals

Interpreting Spark Logs
Cluster Monitoring and SparkUI
Performance Fundamentals and Recipes

Similar Courses

Reviews

No Reviews Available yet

Be the first to write a review

Course Features

Certificate on completion
Educative
English
Intermediate
Development ,Apache Spark

Enrollment options

Unlimited Plan

7 - days Free access
Unlimited access to 250+ courses
Certificate on completion
Cancel subscription anytime (policy)
$16.66/month - Annual Plan (72% saving)
$59/month - Monthly Plan (after free trial)

One time Purchase

$39/year
Certificate on completion
1 year access (policy)