Want to master your topic faster? Let AI build your personalized course

📚 Smarter courses, 🔍 adaptive quizzes, 🎓 real certificates.

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Big Data Analysis with Scala and Spark

Big Data Analysis with Scala and Spark

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas.

74.8K

total enrollments

4.7

( 295 )

Total ratings

Free

Go to Course SAVE

Course Overview
Reviews

Description

We'll go over Spark's programming model in depth, paying close attention to how and when it differs from other programming models, such as shared-memory parallel collections or sequential Scala collections. We'll learn when important distribution issues like latency and network communication should be considered, and how to address them effectively for improved performance, using hands-on examples in Spark and Scala.

Outcomes of Learning You will be able to do the following by the end of this course:

read data from persistent storage and load it into Apache Spark,
manipulate data with Spark and Scala,
express algorithms for data analysis in a functional style,
recognize how to avoid shuffles and recomputation in Spark,

Syllabus :

1. Getting Started + Spark Basics

Introduction, Logistics, What You'll Learn
Data-Parallel to Distributed Data-Parallel
Latency
RDDs, Spark's Distributed Collection
RDDs: Transformation and Actions
Evaluation in Spark: Unlike Scala Collections!
Cluster Topology Matters!

2. Reduction Operations & Distributed Key-Value Pairs

Reduction Operations
Pair RDDs
Transformations and Actions on Pair RDDs
Joins

3. Partitioning and Shuffling

Shuffling: What it is and why it's important
Partitioning
Optimizing with Partitioners
Wide vs Narrow Dependencies

4. Structured data: SQL, Dataframes, and Datasets

Structured vs Unstructured Data
Spark SQL
DataFrames
Datasets

Similar Courses

Reviews

No Reviews Available yet

Be the first to write a review

Course Features

Certificate on purchase
28h
Coursera École Polytechnique Fédérale de Lausanne
English
Intermediate
Self Paced
Development ,Big Data

Enrollment options

Free Audit

Course Material
Graded Assignment
Practice Quizzes
Certificate on completion

Free Trial

7 - days Free access
$49/month (after free trial)
Cancel subscription anytime
Course Material
Graded Assignment
Practice Quizzes
Project
Certificate on completion

Coursera Plus - Monthly

7 - days free access
$59/month (after trial ends)
Unlimited Access to 3000+ courses
Cancel subscription anytime

Coursera Plus - Annual

$399/year (43% saving)
14 - days refund
Unlimited Access to 3000+ courses

Looking for the financial Help? Apply