In this course, you will learn :
- Discover how to solve some of the most common dirty data issues. To avoid double-counting, you'll convert data types, apply range constraints to remove future data points, and remove duplicated data points.
- Learn how to fix inconsistencies in whitespace and capitalization in category labels, collapse multiple categories into one, and reformat strings for consistency.
- Learn valuable skills that will assist you in ensuring that values have been added correctly and that missing values do not have a negative impact on your analyses.
- how to link records by calculating string similarity—you'll then apply your new knowledge to merge two restaurant review datasets into a single clean master dataset
- Common data problems
- Text and categorical data problems
- Advanced data problems
- Record linkage