Description

Learn how to process data in real-time using modern data engineering tools like Apache Spark, Kafka, Spark Streaming, and Kafka Streaming. You'll begin by learning about the components of data streaming systems. After that, you'll create a real-time analytics application. Students will also collect data, run analytics, and derive insights from reports generated by the streaming console.

Syllabus:

Course 1: Foundations of Data Streaming, and SQL & Data Modeling for the Web

Introduction to Stream Processing

Describe and explain streaming data stores and stream processing
Describe and explain real-world usages of stream processing
Describe and explain append-only logs, events, and how stream processing differs from batch processing
Utilize Kafka CLI tools and the Confluent Kafka Python library for topic management, production, and consumption

Apache Kafka

Describe and explain Kafka architecture
Describe and explain Kafka topics and configuration
Utilize Confluent Kafka Python to create topics and configuration
Describe and explain Kafka producers, consumers, and configuration
Utilize Confluent Kafka Python to create producers and configuration
Utilize Confluent Kafka Python to create topics, configuration, and manage offsets
Describe and explain user privacy considerations
Describe and explain performance monitoring for consumers, producers, and the cluster itself

Data Schemas and Apache Avro

Describe and explain what a data schema is and what value it provides
Describe and explain what Apache Avro is and what value it provides
Utilize AvroProducer and AvroConsumer in Confluent Kafka Python
Describe and explain schema evolution and data compatibility types
Utilize Schema Registry components in Confluent Kafka Python to manage compatibility

Kafka Connect and REST Proxy

Describe and explain what problem Kafka Connect solves for and where it would be more appropriate than a traditional consumer
Describe and explain common connectors and how they work
Utilize Kafka Connect FIleStream Source and Sink
Utilize Kafka Connect JDBC Source and Sink
Describe and explain what problem Kafka REST Proxy solves for and where it would be more appropriate than alternatives
Describe and explain the REST Proxy metadata and administrative APIs
Utilize the REST Proxy administrative and metadata APIs
Describe and explain the REST Proxy consumer APIs
Utilize the REST Proxy consumer, subscription, and offset APIs
]Describe and explain the REST Proxy producer APIs
Utilize the REST Proxy producer APIs

Stream Processing Fundamentals

Describe and explain common scenarios for stream processing, and where you would use stream versus batch
Describe and explain common stream processing strategies
Describe and explain how time and windowing works in stream processing
Describe and explain what a stream versus a table is in stream processing, and where you would use on over the other
Describe and explain how data storage works in stream processing applications and why it is needed

Stream Processing with Faust

Describe and explain the Faust Stream Processing Python library, and how it fits into the ecosystem relative to solutions like Kafka Streams
Describe and explain Faust stream-based processing
Utilize Faust to create a stream-based application
Describe and explain how Faust table-based processing works
Utilize Faust to create a table-based application
Describe and explain Faust processors and function usage
Utilize Faust processor and function
Describe and explain Faust serialization and deserialization
Utilize Faust serialization and deserialization

KSQL

Describe and explain how KSQL fits into the Kafka ecosystem, and why you would choose it over a stream processing application built from scratch
Describe and explain KSQL architecture
Describe and explain how to create KSQL streams and tables from topics. Understand the importance of KEY and schema transformations.
Utilize KSQL to create tables and streams
Describe and explain KSQL selection syntax
Utilize KSQL syntax to query tables and streams
Describe and explain KSQL windowing
Utilize KSQL windowing within the context of table analysis
Describe and explain KSQL grouping and aggregates
Utilize KSQL grouping and aggregates within queries

Project: Optimize Chicago Bus and Train Availability Using Kafka

In your first project, you will stream public transit status using Kafka and the Kafka ecosystem to create a stream processing application that displays train status in real-time. You will be able to optimise the availability of buses and trains in Chicago based on streaming data if you learn the skills. You will learn how to have your own Python code generate events, how to use REST Proxy to send events over HTTP, and how to use Kafka Connect to collect data from a Postgres database to produce streaming data into Kafka from a variety of sources. Then, using KSQL, you will combine related data models into a single topic ready for consumption by downstream Python applications, and you will finish a simple Python application that ingests data from the Kafka topics for analysis.Finally, the Faust Python Stream Processing library will be used to further transform train station data into a more streamlined representation: This library will use stateful processing to determine whether passenger volume is increasing, decreasing, or remaining constant.

Course 2: Streaming API Development and Documentation

Streaming DataFrames

Start a Spark Cluster and Deploy a Spark Application
Create a Spark Streaming DataFrame with a Kafka Source
Create a Spark View • Query a Spark View

Joins and JSON

Parse a JSON Payload Into Separate Fields for Analysis
Join Two Streaming DataFrames from Different Data Sources
Write a Streaming DataFrame to Kafka with Aggregated Data

Redis, Base64, and JSON

Manually Save to Redis and Read the Same Data from a Kafka Topic
Parse Base64 Encoded Information
Sink a Subset of JSON Fields

Project: Evaluate Human Balance with Spark Streaming

You will be working with a real-world application called the Step Trending Electronic Data Interface in this project (STEDI). It is a functional application that is used to assess the risk of seniors falling. When a senior takes a test, they are scored using an index that reflects the likelihood of falling and possibly injuring themselves while walking. STEDI stores risk scores and other data in a Redis datastore. At a STEDI clinic, the Data Science team has completed a working graph for population risk. The issue is that the data has not yet been populated. You will create a Kafka topic containing anonymized risk scores of seniors in the clinic using Kafka Connect Redis Source events and Business Events.

Learn Data Streaming Online

Learn the skills to take you into the next era of data engineering. Build real-time applications to process big data at scale.

4.2

Discount Offer

Description

Similar Courses

Reviews

No Reviews Available yet

Course Features

Enrollment options

Nanodegree Program

Learn Data Streaming Online

Learn the skills to take you into the next era of data engineering. Build real-time applications to process big data at scale.

4.2

Discount Offer

Description

Similar Courses

Advanced NoSQL for Data Science

Data Science Tools of the Trade: First Steps

Computational Thinking and Big Data

Smarter Cities: Using Data to Drive Urban Innovation

大數據分析：商業應用與策略管理 (Big Data Analytics: Business Applications and Strategic Decisions)

The Ultimate Hands-On Hadoop - Tame your Big Data!

Java for Data Scientists Essential Training

Edge Analytics: IoT and Data Science

Fundamentals of Data Analysis for Big Data

Reviews

No Reviews Available yet

Course Features

Enrollment options

Nanodegree Program