Description
In this course, you will learn :
- Discover the parallels and differences between Spark and Hadoop.
- Investigate the problems that Spark attempts to solve; this will give you a good idea of the need for Spark.
- Learn "How is Spark faster than Hadoop?" and you will understand the reasons for Spark's performance and efficiency.
- Before we get into what RDD is, we'll go over why something like RDD is needed in the first place.
- You will gain a solid foundation in understanding RDDs in depth, and we will then go over and clarify some of the common misconceptions about RDDs among new Spark learners.
- You will understand the various types of dependencies between RDDs, as well as why dependencies are important.
- We will walk you through the process of translating the programme we write into actual execution behind the scenes in a Spark cluster.
- You will gain a thorough understanding of some of the key concepts underlying Spark's execution engine, as well as the reasons why it is so efficient.
- Learn about fault tolerance by simulating a fault situation and observing how Spark recovers from it.
- You will learn how spark manages memory and the contents of memory.
- Recognize the need for a new programming language such as Scala.
- Examine the differences between object-oriented and functional programming.
- Investigate Scala's features and functions.
Syllabus :
- Let's get started
- Running Spark on your computer
- Spark vs. Hadoop - who wins ?
- Challenges Spark tries to address
- How Spark is faster than Hadoop ?
- The need for RDD
- What is RDD ?
- What an RDD is not
- First program in Spark
- What are dependencies & why they are important ?
- Program to Execution (Part 1)
- Program to Execution (Part 2)
- Memory management
- Fault tolerance
- Introduction to Scala
- First program in Scala (not Hello World)
- Scala functions