Description
You will plan, design, and implement enterprise data infrastructure solutions in this programme, as well as create blueprints for an organization's data management system. You will design a PostGreSQL relational database, an Online Analytical Processing (OLAP) data model to build a cloud-based data warehouse, and a scalable data lake architecture to meet the needs of Big Data.Finally, you'll learn how to apply data governance principles to an organization's data management system.
Syllabus:
Course 1: Data Architecture Foundations
What is Data Architecture?
- Define data architecture characteristics
- Define data governance and its role
- Define scalability and flexibility in database design
Database Framework
- Introduction to ERDs
- Develop a database schema
- Understand normalization and its use cases
- Learn to normalize data to the 3rd Normal Form
Data Design
- Introduction to ERDs
- Build a conceptual ERD
- Build a logical ERD
- Learn about cardinality and Crow’s Foot notation
- Build a physical ERD
Creating a Physical Database
- Learn about factors that affect database performance
- Learn about file and data storage solutions
- Use DDL SQL to create database objects in PostGreSQL
- Learn about data ingestions methods, including: ETL, Pipelines, APIs and direct feeds
- Use DML SQL to populate a database with data in PostGreSQL
- Use CRUD SQL commands to demonstrate proper operation of a database
Project: Designing an HR Database
You will design, build, and populate a database for the Human Resources (HR) Department at the fictitious Tech ABC Corp, a video game company, in this project. This project will begin with a request from the Human Resources Manager. Then, using the foundational principles of data architecture, you must design a database that is best suited to the department's needs.You will go through the steps of database architecture, database proposal creation, database entity relationship diagram creation, and finally database creation. This project is a scaled-down simulation of the types of real-world assignments that data architects deal with on a daily basis.
Course 2: Designing Data Systems
Enterprise Data Architecture
- Understand importance of Data Architecture in any organization
- Learn the benefits of executing a Data Architecture
- Learn the business and technical artifacts required
- Understand business and functional requirements
- Learn how OLTP, ODS and OLAP models are being designed
Staging Data
- Build staging area for data ingestion
- Learn to organize data assets based on schemas
- Design schedules for data processing based on the requirements
- Learn to manage staging area through metadata
Operational Data Store
- Build an integrated ER model connecting distributed data assets
- Learn to design Data Dictionary and Master Data
- Apply normalization rules to eliminate redundancies
- Learn when to use ETL vs ELT techniques
- Learn to cleanse data anomalies
Data Warehouse
- Learn two OLAP modeling designs — Star and Snowflake schemas
- Learn various dimensional and fact table types
- Build ELT data processing from ODS to Data warehouse
- Write SQL queries for the purpose of reporting
Project: Design a Data Warehouse for Reporting and OLAP
In this project, you will create an end-to-end data architecture, build data ingestion from Yelp and Climatic source systems, design an Operational Data Store and a data warehouse system, and transform data from staging to ODS and then from ODS to the data warehouse system. The Yelp source includes a list of businesses and restaurants, as well as reviews and ratings.Temperature and precipitation data are tracked by climatic data sources. Both of these websites are completely separate and unrelated sources. The project's ultimate goal is to write appropriate SQL to determine the impact of weather on restaurant ratings.
Course 3: Big Data Systems
Characteristics of Big Data
- Explain what is big data
- Articulate the business value of big data
- Describe the characteristics of big data
- Distinguish between horizontal scaling vs vertical scaling
- Describe the components of a big data ecosystem
Ingestion, Storage and Processing Frameworks
- Explain how distributed storage works in HDFS
- Explain how distributed processing works
- Explain how resources are managed in a Hadoop cluster
- Distinguish between different distributed processing frameworks
- Apply frameworks to appropriate use cases Need Help? Speak with an Advisor: www.udacity.com/advisor Data Architect | 6 LEARNING OUTCOMES
NoSQL Databases
- Explain difference between SQL and NoSQL Databases
- Differentiate between ACID and CAP properties of SQL and NoSQL databases
- Implement, create, read, write, update NoSQL DB operations with DynamoDB
- Create simple NoSQL data model
Scalable Data Lake Architecture
- Explain what is a data lake and it’s business value
- Distinguish between different data formats and their application
- Articulate Data Lake design patterns and challenges
- Explain how to enable transactional capabilities in Data Lake
Project: Design an Enterprise Data Lake System
In this project, you will work as a Big Data Architect on a real-world use case encountered by a Medical Data Processing Company.
You must analyse the company's current architecture, understand technical and business requirements, and propose a new Data Lake-based solution to both technical and executive audiences. You will create a design document outlining your solution with rationale for technical audiences, and a short presentation pitching your solution for executive audiences. This is a real-world scenario in which you will act as a company's expert data infrastructure consultant and solve the challenges that the company is currently facing.You will also improve your presentation skills and learn how to articulate complex technical terminologies as simple and value-driven objectives to company leadership.
Course 4: Data Governance
Introduction to Data Governance
- Understand what is Data Governance and its importance
- Learn about the different disciplines of Data Governance
- Understand the different stakeholders involved in Data Governance projects
Metadata Management
- Understand the different types of metadata
- Understand the components and capabilities of Metadata Management System
- Create conceptual and logical Enterprise Data Models
- Create an Enterprise Data Catalog
Data Quality Management
- Perform data profiling using various techniques using data quality dimensions
- Identify remediation options for data quality issues
- Measure data quality using data quality scores and thresholds
- Monitor data quality using dashboards, exception and trend reports
Master Data Management
- Understand the concepts of master data and golden record
- Understand different types of Master Data Management Architectures
- Create a golden record using various match and merge techniques
- Understand data governance processes for authoring, monitoring and approval of master data
Project: Data Governance at SneakerPark
In this project, you will implement data governance solutions for SneakerPark, an online shoe retailer, to help them better manage their data now and in the future. To begin, you will develop an Enterprise Data Model that provides a comprehensive view of all the data in their systems.Following that, you will document the metadata in an Enterprise Data Catalog and profile the data in their systems to identify data quality issues, recommend remediation strategies for each of these issues, and design a data quality dashboard.
Finally, you will sketch out a proposed MDM implementation architecture, define a set of matching rules for the creation of customer and item master data, and define a set of matching rules for the creation of customer and item master data and define the data governance roles and responsibilities required to oversee this data governance initiative