Description
In this course, you will :
- explains why clean data is so important, what causes errors, and how to detect, prevent, and fix errors in order to keep your data clean
- explains the various types of errors that can occur in data, as well as missing or incorrect values in data
- goes over how human errors, machine-introduced errors, and design errors can all find their way into your data, and then shows you how to detect them.
- explores error prevention techniques such as digital signatures, data pipelines and automation, and transactions
- concludes with methods for correcting errors, such as renaming fields, changing types, joining and splitting data, and more.
Syllabus :
1. Bad Data
- Types of errors
- Missing values
- Bad values
- Duplicates
2. Causes of Errors
- Human errors
- Machine errors
- Design errors
3. Detecting Errors
- Schemas
- Validation
- Finding missing data
- Domain knowledge
- Subgroups
4. Preventing Errors
- Serialization formats
- Digital signatures
- Data pipelines and automation
- Transactions
- Data organization and tidy data
- Process and data quality metrics
5. Fixing Errors
- Renaming fields
- Fixing types
- Joining and splitting data
- Deleting bad data
- Filling missing values
- Reshaping data