Want to master your topic faster? Let AI build your personalized course

📚 Smarter courses, 🔍 adaptive quizzes, 🎓 real certificates.

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Apache Spark with Scala - Hands On with Big Data!

Apache Spark with Scala - Hands On with Big Data!

Apache Spark tutorial with 20+ hands-on examples of analysing large data sets with Scala on your desktop or Hadoop!

81.6K

total enrollments

Discount Offer

Go to Course SAVE

Course Overview
Reviews

Description

In this course, you will learn :

Scala is a programming language that can be used to create distributed code.
SparkSQL, DataSets, and DataFrames are used to transform structured data.
Big data analysis problems should be framed as Apache Spark scripts.
Partitioning, caching, and other techniques can be used to optimise Spark jobs.
Spark scripts can be built, deployed, and run on Hadoop clusters.
Spark Streaming is used to process continuous streams of data.
GraphX is a tool for traversing and analysing graph structures.
Machine Learning on Spark can be used to analyse large amounts of data.

Syllabus :

1. Scala Crash Course

[Activity] Scala Basics
[Exercise] Flow Control in Scala
[Exercise] Functions in Scala
[Exercise] Data Structures in Scala

2. Using Resilient Distributed Datasets (RDDs)

The Resilient Distributed Dataset
Ratings Histogram Example
Spark Internals
Key / Value RDD's, and the Average Friends by Age example
[Activity] Running the Average Friends by Age Example
Filtering RDD's, and the Minimum Temperature by Location Example
[Activity] Running the Minimum Temperature Example, and Modifying it for Maximum
[Activity] Counting Word Occurrences using Flatmap()
[Activity] Improving the Word Count Script with Regular Expressions
[Activity] Sorting the Word Count Results
[Exercise] Find the Total Amount Spent by Customer
[Exercise] Check your Results, and Sort Them by Total Amount Spent
Check Your Results and Implementation Against Mine

3. SparkSQL, DataFrames, and DataSets

[Activity] Using SparkSQL
[Activity] Using DataSets
[Exercise] Implement the "Friends by Age" example using DataSets
Exercise Solution: Friends by Age, with Datasets.
[Activity] Word Count example, using Datasets
[Activity] Revisiting the Minimum Temperature example, with Datasets
[Exercise] Implement the "Total Spent by Customer" problem with Datasets

4. Advanced Examples of Spark Programs

[Activity] Find the Most Popular Movie
[Activity] Use Broadcast Variables to Display Movie Names
[Activity] Find the Most Popular Superhero in a Social Graph
[Exercise] Find the Most Obscure Superheroes
Exercise Solution: Find the Most Obscure Superheroes
Superhero Degrees of Separation: Introducing Breadth-First Search
Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
[Activity] Superhero Degrees of Separation: Review the code, and run it!
Item-Based Collaborative Filtering in Spark, cache(), and persist()
[Activity] Running the Similar Movies Script using Spark's Cluster Manager
[Exercise] Improve the Quality of Similar Movies

5. Running Spark on a Cluster

[Activity] Using spark-submit to run Spark driver scripts
[Activity] Packaging driver scripts with SBT
[Exercise] Package a Script with SBT and Run it Locally with spark-submit
Exercise solution: Using SBT and spark-submit
Introducing Amazon Elastic MapReduce
Creating Similar Movies from One Million Ratings on EMR
Partitioning
Best Practices for Running on a Cluster
Troubleshooting, and Managing Dependencies

6. Machine Learning with Spark ML

Introducing MLLib
[Activity] Using MLLib to Produce Movie Recommendations
Linear Regression with MLLib
[Activity] Running a Linear Regression with Spark
[Exercise] Predict Real Estate Values with Decision Trees in Spark

7. Intro to Spark Streaming

The DStream API for Spark Streaming
[Activity] Real-time Monitoring of the Most Popular Hashtags on Twitter
Structured Streaming
[Activity] Using Structured Streaming for real-time log analysis
[Exercise] Windowed Operations with Structured Streaming
Exercise Solution: Top URL's in a 30-second Window

8. Intro to GraphX

GraphX, Pregel, and Breadth-First-Search with Pregel.
Using the Pregel API with Spark GraphX
[Activity] Superhero Degrees of Separation using GraphX

Similar Courses

Reviews

No Reviews Available yet

Be the first to write a review

Course Features

Certificate on completion
Udemy
English
Beginner
Development ,Apache Spark

Enrollment options

Course Material
Certificate on completion
30 days Refund (refund policy)
Lifetime Access
Instructor direct message
Instructor Q&A