50,000+ Free Udemy Courses to Start Today

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Big Data Analytics with Hadoop and Apache Spark

Big Data Analytics with Hadoop and Apache Spark

Discover how to build scalable and optimized data analytics pipelines by combining the powers of Apache Hadoop and Spark.

16.8K

total enrollments

Free Trial

Go to Course SAVE

Course Overview
Reviews

Description

In this course, you will learn:

How to leverage these two technologies to build scalable and optimized data analytics pipelines. Instructor Kumaran Ponnambalam explores ways to optimize data modeling and storage on HDFS; discusses scalable data ingestion and extraction using Spark; and provides tips for optimizing data processing in Spark.

Syllabus:

Introduction

1. Introduction and Setup

Apache Hadoop overview
Apache Spark overview
Integrating Hadoop and Spark
Setting up the environment
Using exercise files

2. HDFS Data Modeling for Analytics

Storage formats
Compression
Partitioning
Bucketing
Best practices for data storage

3. Data Ingestion with Spark

Reading external files into Spark
Writing to HDFS
Parallel writes with partitioning
Parallel writes with bucketing
Best practices for ingestion

4. Data Extraction with Spark

How Spark works
Reading HDFS files with schema
Reading partitioned data
Reading bucketed data
Best practices for data extraction

5. Optimizing Spark Processing

Pushing down projections
Pushing down filters
Managing partitions
Managing shuffling
Improving joins
Storing intermediate results
Best practices for data processing

6. Use Case Project

Problem definition
Data loading
Total score analytics
Average score analytics
Top student analytics

Similar Courses

Course Features

1 month trial
Certificate on completion
1h 1m
Linkedin Learning
English
Professional
Self Paced
Development ,Hadoop

Enrollment options

1-month free trial (Free Trial)
Unlimited access to 16,000+ courses
Interactive Quizzes
Exercise files
Certification of completion
Course Matirial
LinkedIn Premium access
$20/month - Annual Plan (33% saving)
$30/month - Monthly Plan