Description
In this course, you will :
- Understand the basics of SELECT statements.
- Understand how and why to filter results.
- Explore grouping and aggregation to answer analytic questions.
- Work with sorting and limiting results.
Syllabus :
1. Orientation to SQL on Big Data
- Review and Preparation
- Using the Hue Query Editors
- Running SQL Utility Statements
- Running SQL SELECT Statements
- Understanding Different SQL Interfaces
- Overview of Beeline and Impala Shell
- Using Beeline
- Using Impala Shell
2. SQL SELECT Essentials
- SQL SELECT Building Blocks
- Introduction to the SELECT List
- Expressions and Operators
- Data Types
- Column Aliases
- Built-In Functions
- Data Type Conversion
- The DISTINCT Keyword
- Introduction to the FROM Clause
- Identifiers
- Formatting SELECT Statements
- Using Beeline in Non-Interactive Mode
- Using Impala Shell in Non-Interactive Mode
- Formatting the Output of Beeline and Impala Shell
- Saving Hive and Impala Query Results to a File
3. Filtering Data
- About the Datasets
- Introduction to the WHERE Clause
- Using Expressions in the WHERE Clause
- Comparison Operators
- Data Types and Precision
- Logical Operators
- Other Relational Operators
- Understanding Missing Values
- Handling Missing Values
- Conditional Functions
- Using Variables with Beeline and Impala Shell
- Calling Beeline and Impala Shell from Scripts
- Querying Hive and Impala in Scripts and Applications
4. Grouping and Aggregating Data
- Introduction to Aggregation
- Common Aggregate Functions
- Using Aggregate Functions in the SELECT Statement
- Introduction to the GROUP BY Clause
- Choosing an Aggregate Function and Grouping Column
- Grouping Expressions
- Grouping and Aggregation, Together and Separately
- NULL Values in Grouping and Aggregation
- The COUNT Function
- Tips for Applying Grouping and Aggregation
- Filtering on Aggregates
- The HAVING Clause
- Understanding Hive and Impala Version Differences
- Understanding Hue Version Differences
5. Sorting and Limiting Data
- Introduction to the ORDER BY Clause
- Controlling Sort Order
- Ordering Expressions
- Missing Values in Ordered Results
- Using ORDER BY with Hive and Impala
- Introduction to the LIMIT Clause
- When to Use the LIMIT Clause
- Using LIMIT with ORDER BY
- Using LIMIT for Pagination
- Review
- How to Effectively Use the Hive and Impala Documentation
- Tips for Using the Hive Documentation
- Tips for Using the Impala Documentation
6. Combining Data
- Combining Query Results with the UNION Operator
- Using ORDER BY and LIMIT with UNION
- Introduction to Joins
- Join Syntax
- Inner Joins
- Outer Joins
- Conclusion
- Handling NULL Values in Join Key Columns
- Non-Equijoins
- Cross Joins
- Left Semi-Joins