Description
In this course, you will learn :
- Discussing the various cloud environments and tools for building scalable data and model pipelines.
- About the various data sets and model types that will be heavily used in day-to-day production.
- Exercises and challenges throughout the course to get you comfortable working with the diverse toolset.
- Investigate streaming model workflows, which are critical for developing real-time data pipelines that move data between different components in a cloud environment.
Syllabus :
1. Introduction
- Applied Data Science
- Python for Scalable Compute
- Cloud Environments
- Coding Environments
- Important Note!
- Introduction to Datasets
- BigQuery to Pandas
- Kaggle to Pandas
- Prototype Models
- Linear Regression
- Logistic Regression
- Keras Regression
- Automated Feature Engineering
2. Models as Web Endpoints
- Web Services
- Echo Service
- Model Persistence
- Model Endpoints
- Deploying a Web Endpoint
- Gunicorn
- Heroku
- Interactive Web Services with Dash
3. Models as Serverless Functions
- Managed Services
- Cloud Functions (GCP)
- Echo Service
- Cloud Storage (GCS)
- Model Function
- Keras Model
- Access Control
- Model Refreshes
- Lambda Functions (AWS)
- Echo Function
- Simple Storage Service (S3)
- Model Function
- API Gateway
4. Containers for Reproducible Models
- Docker
- Orchestration
- AWS Container Registry (ECR)
- AWS Container Service (ECS)
- Load Balancing
- Kubernetes on GCP
5. Workflow Tools for Model Pipelines
- Sklearn Workflow
- Cron
- Cloud Cron
- Workflow Tools
- Apache Airflow
- Managed Airflow
6. PySpark for Batch Pipelines
- Spark Environments
- Spark Clusters
- Databricks Community Edition
- Staging Data
- A PySpark Primer
- Persisting Dataframes
- Converting Dataframes
- Transforming Data
- Pandas UDFs
- Best Practices
- MLlib Batch Pipeline
- Model Application
- Distributed Deep Learning
- Distributed Feature Engineering
- GCP Model Pipeline
- BigQuery Export
- GCP Credentials
- Model Pipeline
- Productizing PySpark
7. Cloud Dataflow for Batch Modeling
- Apache Beam
- Batch Model Pipeline
- Model Training
- BigQuery Publish
- Datastore Publish
8. Streaming Model Workflows
- Spark Streaming
- Apache Kafka
- Sklearn Streaming
- Dataflow Streaming
- PubSub
- Natality Streaming