Description
In this course, you will :
- Show you how to use Hive to process data. Ben Sullins begins by demonstrating how to structure and optimise your data.
- explains how to configure Hue, the Hadoop user interface, to use HiveQL for data analysis.
- He then shows how to load data, create aggregate tables for quick query access, and run advanced analytics using the newly configured option.
- It also walks you through the process of managing tables and putting functions to use.
- This course is designed to help you discover new ways to work with datasets so that you can answer tough data science questions.
Syllabus :
1. Hive Concepts and Setup
- Why use Hive
- How Hive works
- Setting up our demo environment
2. Working with Data in Hive
- Understanding table structures in Hive
- Creating tables in Hive
- Handling CSV files in Hive
- Partitioning tables
3. Retrieving Data from Hive
- Simple SELECT statement
- Retrieving data from complex structures
4. Aggregating Data
- Simple aggregations
- Enhanced aggregations with grouping sets
- Using CUBE and ROLLUP
5. Filtering Results
- Simple filter with the WHERE clause
- Filtering aggregates with HAVING clause
- Finding similar values with LIKE
6. Joining Tables
- Combining tables with JOIN
- When to use SEMI JOIN
- Joining multiple tables together
7. Manipulating Data
- Types of data manipulation functions
- String functions
- Math functions
- Date functions
- Conditional functions