Description
In this course, you will :
- Mike Chapple, instructor, explains data mining and assists you in preparing for the second domain of the Data+ exam. Mike walks you through data acquisition and integration processes such as ETL and ELT, APIs, web scraping, sampling, and more.
- He discusses cleaning and profiling datasets before moving on to data manipulation techniques like sorting, filtering, aggregating, and recording your data, as well as data transposition and normalisation.
- indexing, record subsets, query execution plans, and parameterization are all discussed as query optimization methods.
Syllabus :
1. Data Acquisition and Integration
- ETL and ELT processes
- Public databases
- Application programming interfaces (APIs)
- Web scraping
- Surveys and observation
- Sampling
2. Cleaning and Profiling Datasets
- Duplicate and redundant data
- Missing data
- Invalid data and outliers
- Handling outliers
3. Data Manipulation Techniques
- Sorting data
- Filtering data
- Aggregating data
- Recoding data
- Data transposition
- Data normalization
- String manipulation
- Working with dates
- Derived values
- Combining datasets
4. Query Optimization
- Indexing
- Record subsets
- Query execution plans
- Parameterization