Description
In this course, you will
- Understand what is Big Data, the challenges with Big Data and how Hadoop propose a solution for the Big Data problem
- Work and navigate Hadoop cluster with ease
- Install and configure a Hadoop cluster on cloud services like Amazon Web Services (AWS)
- Understand the difference phases of MapReduce in detail
- Write optimized Pig Latin instruction to perform complex data analysis
- Write optimized Hive queries to perform data analysis on simple and nested datasets
- Work with file formats like SequenceFile, AVRO etc
- Understand Hadoop architecture, Single Point Of Failures (SPOF), Secondary/Checkpoint/Backup nodes, HA configuration and YARN
- Tune and optimize slowing running MapReduce jobs, Pig instructions and Hive queries
- Understand how Joins work behind the scenes and will be able to write optimized join statements
- Wherever possible, students will be introduced to difficult questions that are asked in real Hadoop interviews