Description
In this course, you will learn :
- Set up a Development Environment on GCP to learn how to build Data Engineering Applications.
- Create tables, indexes, run SQL queries, use important pre-defined functions, and other database essentials for data engineering with Postgres.
- Python Data Engineering Programming Essentials such as basic programming constructs, collections, Pandas, database programming, and so on.
- Using Spark Dataframe APIs for Data Engineering (PySpark). Learn all of the essential Spark Data Frame APIs, such as select, filter, groupBy, orderBy, and so on.
- Spark SQL Data Engineering (PySpark and Spark SQL). Learn how to create high-quality Spark SQL queries with SELECT, WHERE, GROUP BY, ORDER BY, and other functions.
- The importance of the Spark Metastore, as well as the integration of Dataframes and Spark SQL.
- The ability to create Data Engineering Pipelines using Spark and Python as a programming language.
- In building Data Engineering Pipelines, various file formats such as Parquet, JSON, CSV, and others are used.
- Setup a self-supporting single node Hadoop and Spark Cluster to gain sufficient experience with HDFS and YARN.
- To build Spark applications with Pyspark, you must first understand the entire Spark application development life cycle. Examine the applications using the Spark UI.
Syllabus :
- Getting Started with ITVersity Labs for Data Engineering Essentials on Udemy
- Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 111
- Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 101
- Setup Environment to learn Python, SQL, Hadoop and Spark using Docker on Mac
- Setting up Environment to learn Python, SQL as well as Spark using AWS Cloud
- Networking Concepts for Beginners - ip addresses and port numbers
- Database Essentials - Getting Started
- Database Essentials - Database Operations
- Database Essentials - Writing Basic SQL Queries
- Database Essentials - Creating Tables and Indexes
- Database Essentials - Partitioning Tables and Indexes
- Database Essentials - Predefined Functions
- Database Essentials - Writing Advanced SQL Queries
- Programming Essentials using Python - Perform Database Operations
- Programming Essentials using Python - Getting Started with Python
- Programming Essentials using Python - Basic Programming Constructs
- Programming Essentials using Python - Predefined Function
- Programming Essentials using Python - User Defined Functions
- Programming Essentials using Python - Overview of Collections - list and set
- Programming Essentials using Python - Overview of Collections - dict and tuple
- Programming Essentials using Python - Manipulating Collections using loops
- Programming Essentials using Python - Development of Map Reduce APIs
- Programming Essentials using Python - Understanding Map Reduce Libraries
- Programming Essentials using Python - Basics of File IO using Python
- Programming Essentials using Python - Delimited Files and Collections
- Programming Essentials using Python - Overview of Pandas Libraries
- Programming Essentials using Python - Database Programming - CRUD Operations
- Programming Essentials using Python - Database Programming - Batch Operations
- Programming Essentials using Python - Processing JSON Data
- Programming Essentials using Python - Processing REST Payloads
- Understanding Python Virtual Environments
- Overview of Pycharm for Python Application Development
- Data Copier - Getting Started
- Data Copier - Reading Data using Pandas
- Data Copier - Database Programming using Pandas
- Data Copier - Loading Data from files to tables
- Data Copier - Modularizing the application
- Data Copier - Dockerizing the application
- Data Copier - Using custom Docker Image
- Data Copier - Deploy and Validate Application on Remote Server
- Validate ITVersity Hadoop and Spark Cluster (for ITVersity lab customers)
- Setup Single Node Hadoop and Spark Cluster or Lab using Docker
- Introduction to Hadoop eco system - Overview of HDFS
- Data Engineering using Spark SQL - Getting Started
- Data Engineering using Spark SQL - Basic Transformations
- Data Engineering using Spark SQL - Managing Tables - Basic DDL and DML
- Data Engineering using Spark SQL - Managing Tables - DML and Partitioning
- Data Engineering using Spark SQL - Overview of Spark SQL Functions
- Data Engineering using Spark SQL - Windowing Functions
- Apache Spark using Python - Data Processing Overview
- Apache Spark using Python - Processing Column Data
- Apache Spark using Python - Basic Transformations
- Apache Spark using Python - Joining Data Sets
- Apache Spark using Python - Spark Metastore
- Getting Started with Semi Structured Data using Spark
- Process Semi Structured Data using Spark Data Frame APIs
- Apache Spark - Development Life Cycle using Python
- Spark Application Execution Life Cycle and Spark UI
- Setup SSH Proxy to access Spark Application logs
- Deployment Modes of Spark Applications