15,000+ Free Udemy Courses to Start Today

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Data Science Foundations: Data Assessment for Predictive Modeling

Data Science Foundations: Data Assessment for Predictive Modeling

Investigate the CRISP-DM methodology's data understanding phase for predictive modelling. Learn how to collect, describe, explore, and validate data.

6.3K

total enrollments

Free Trial

Go to Course SAVE

Course Overview
Reviews

Description

In this course, you will :

introduces a systematic approach to predictive modeling's data understanding phase
teaches principles, guidelines, and tools like KNIME and R for properly assessing a data set's suitability for machine learning
Learn how to collect data, describe data, explore data using bivariate visualisations, and verify data quality before moving on to the data preparation phase.
For improved knowledge retention, the course includes case studies and best practises, as well as challenge and solution sets.
By the end, you should have acquired the knowledge and skills required to pay close attention to this critical phase of all successful data science projects.

Syllabus :

1. What Is Data Assessment?

Clarifying how data understanding differs from data visualization
Introducing the critical data understanding phase of CRISP-DM
Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP
Navigating the transition from business understanding to data understanding
How to organize your work with the four data understanding tasks

2. Collect Initial Data

Considerations in gathering the relevant data
A strategy for processing data sources
Getting creative about data sources
How to envision a proper flat file
Anticipating data integration

3. First Look at the Data

Reviewing basic concepts in the level of measurement
What is dummy coding?
Expanding our definition of level of measurement
Taking an initial look at possible key variables
Dealing with duplicate IDs and transactional data
How many potential variables (columns) will I have?
How to deal with high-order multiple nominals

4. Data Loading and Unit of Analysis

Introducing the KNIME Analytics Platform
Tips and tricks to consider during data loading
Unit analysis decisions

5. Describe Data

How to uncover the gross properties of the data
Researching the dataset
Tips and tricks using simple aggregation commands
A simple strategy for organizing your work

6. Data Description Case Studies

Describe data demo using the UCI heart dataset

7. Explore Data Basics

The explore data task
How to be effective doing univariate analysis and data visualization
Anscombe's quartet
The Data Explorer node feature in KNIME
How to navigate borderline cases of variable type
How to be effective in doing bivariate data visualization

8. Explore Data Tips and Tricks

How to utilize an SME's time effectively
Techniques for working with the top predictors
Advice for weak predictors
Tips and tricks when searching for quirks in your data
Learning when to discard rows
Introducing ggplot2
Orientating to R's ggplot2 for powerful multivariate data visualizations

9. Verify Data Quality

Exploring your missing data options
Why you lose rows to listwise deletion
Investigating the provenance of the missing data

10. Missing Data Case Study

Introducing the KDD Cup 1998 data
What is the pattern of missing data in your data?
Is the missing data worth saving?
Assessing imputation as a potential solution

11. Explore and Verify Case Studies

Exploring and verifying data quality with the UCI heart dataset

12. Making the Transition to Data Preparation

Why formal reports are important
Creating a data prep to-do list
How to prepare for eventual deployment

Similar Courses

Reviews

No Reviews Available yet

Be the first to write a review

Course Features

Certificate on completion
Linkedin Learning
English
Beginner
Business ,Predictive Analytics

Enrollment options

1-month free trial (Free Trial)
Unlimited access to 16,000+ courses
Interactive Quizzes
Exercise files
Certification of completion
Course Matirial
LinkedIn Premium access
$20/month - Annual Plan (33% saving)
$30/month - Monthly Plan