15,000+ Free Udemy Courses to Start Today

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Taming Big Data with Apache Spark and Python - Hands On!

Taming Big Data with Apache Spark and Python - Hands On!

Apache Spark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!

62.6K

total enrollments

4.5

( 11K )

Total ratings

Discount Offer

Go to Course SAVE

Course Overview
Reviews

Description

In this course, you will :

Use DataFrames and Structured Streaming in Spark 3
Frame big data analysis problems as Spark problems
Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN
Install and run Apache Spark on a desktop computer or on a cluster
Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's
Implement iterative algorithms such as breadth-first-search using Spark
Use the MLLib machine learning library to answer common data mining questions
Understand how Spark SQL lets you work with structured data
Understand how Spark Streaming lets your process continuous streams of data in real time
Tune and troubleshoot large jobs running on a cluster
Share information between nodes on a Spark cluster using broadcast variables and accumulators
Understand how the GraphX library helps with network analysis problems

Syllabus :

1. Spark Basics and the RDD Interface

What's new in Spark 3?
Introduction to Spark
The Resilient Distributed Dataset (RDD)
Ratings Histogram Walkthrough
Key/Value RDD's, and the Average Friends by Age Example
Filtering RDD's, and the Minimum Temperature by Location Example
Check Your Sorted Implementation and Results Against Mine.

2. SparkSQL, DataFrames, and DataSets

Introducing SparkSQL
Executing SQL commands and SQL-style functions on a DataFrame
Using DataFrames instead of RDD's
Exercise Solution: Friends by Age, with DataFrames
Exercise Solution: Total Spent by Customer, with DataFrames

3. Advanced Examples of Spark Programs

Find the Most Popular Superhero in a Social Graph
Exercise Solution: Most Obscure Superheroes
Superhero Degrees of Separation: Introducing Breadth-First Search
Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
Item-Based Collaborative Filtering in Spark, cache(), and persist()

4. Running Spark on a Cluster

Introducing Elastic MapReduce
Partitioning
Create Similar Movies from One Million Ratings
Troubleshooting Spark on a Cluster
More Troubleshooting, and Managing Dependencies

5. Machine Learning with Spark ML

Introducing MLLib
Analyzing the ALS Recommendations Results

6. Spark Streaming, Structured Streaming, and GraphX

Spark Streaming
Exercise Solution: Using Structured Streaming with Windows
GraphX

Similar Courses

Reviews

No Reviews Available yet

Be the first to write a review

Course Features

30 days return
Certificate on completion
Udemy
English
Beginner
Self Paced
Development ,Apache Spark

Enrollment options

Course Material
Certificate on completion
30 days Refund (refund policy)
Lifetime Access
Instructor direct message
Instructor Q&A