Description
In this course, you will :
- Discover the principles of tidy data and how to create and manipulate data tibbles, transforming them from source data into tidy formats.
- uses the R programming language and the tidyverse packages to teach the concept of data wrangling—the data cleaning and transformation tasks that take up a significant amount of analyst time.
- concludes with three hands-on case studies that reinforce the data wrangling principles and tactics covered in this course.
Syllabus :
1. Tidy Data
- What is tidy data?
- Variables, observations, and values
- Common data problems
- Using the tidyverse
2. Working with Tibbles
- Building and printing tibbles
- Subsetting tibbles
- Filtering tibbles
3. Importing Data into R
- What are CSV files?
- Importing CSV files into R
- What are TSV files?
- Importing TSV files into R
- Importing delimited files into R
- Importing fixed-width files into R
- Importing Excel files into R
- Reading data from databases and the web
4. Data Transformation
- Wide vs. long datasets
- Making wide datasets long with gather()
- Making long datasets wide with spread()
- Converting data types in R
- Working with dates and times in R
5. Data Cleaning
- Detecting outliers
- Missing and special values in R
- Breaking apart columns with separate()
- Combining columns with unite()
- Manipulating strings in R with stringr
6. Data Wrangling Case Study: Coal Consumption
- Understanding the coal dataset
- Reading in the coal dataset
- Converting the coal dataset from long to wide
- Segmenting the coal dataset
- Visualizing the coal dataset
7. Data Wrangling Case Study: Water Quality
- Understanding the water quality dataset
- Reading in the water quality dataset
- Filtering the water quality dataset
- Water quality data types
- Correcting data entry errors
- Identifying and removing outliers
- Converting temperature from Fahrenheit to Celsius
- Widening the water quality dataset
8. Data Wrangling Case Study: Social Security Disability Claims
- Understanding the Social Security Disability dataset
- Importing the Social Security Disability dataset
- Making the Social Security Disability dataset long
- Formatting dates in the Social Security Disability dataset
- Handling fiscal years in the Social Security Disability dataset
- Widening the Social Security Disability dataset
- Visualizing the Social Security Disability dataset