Description
Improve your programming abilities and your ability to work with messy, complex datasets. You'll discover how to manipulate and prepare data for analysis, as well as how to create visualisations for data exploration. Finally, you'll discover how to use your data skills to tell a storey with data.
Course 1: Introduction to Data Analysis
Anaconda
- Learn to use Anaconda to manage packages and environments for use with Python
Jupyter Notebooks
- Learn to use this open-source web application to combine explanatory text, math equations, code, and visualizations in one sharable document
Data Analysis Process
- Learn about the keys steps of the data analysis process.
- Investigate multiple datasets using Python and Pandas.
Pandas and AND NumPy: Case Study 1
- Perform the entire data analysis process on a dataset
- Learn to use NumPy and Pandas to wrangle, explore, analyze, and visualize data
Pandas and AND NumPy: Case Study 2
- Perform the entire data analysis process on a dataset
- Learn more about NumPy and Pandas to wrangle, explore, analyze, and visualize data
Programming Workflow for Data Analysis
- Learn about how to carry out analysis outside Jupyter notebook using IPython or the command line interface
Project: Explore Weather Trends
- This project will teach you the basics of SQL and how to download data from a database. You will examine local and global temperature data and compare temperature trends in your area to global temperature trends.
Project: Investigate a Dataset
- In this project, you will investigate one of Udacity's curated datasets using NumPy and pandas. You will complete the entire data analysis process, beginning with a question and ending with a presentation of your findings.
Course 2: Practical Statistics
Simpson’s Paradox
- Examine a case study to learn about Simpson’s Paradox
Probability
- Learn the fundamental rules of probability.
Binomial Distribution
- Learn about binomial distribution where each observation represents one of two outcomes
- Derive the probability of a binomial distribution
Conditional Probability
- Learn about conditional probability, i.e., when events are not independent.
Bayes Rule
- Build on conditional probability principles to understand the Bayes rule
- Derive the Bayes theorem
Standardizing
- Convert distributions into the standard normal distribution using the Z-score.
- Compute proportions using standardized distributions.
Sampling Distributions and Central Limit Theorem
- Use normal distributions to compute probabilities
- Use the Z-table to look up the proportions of observations above, below, or in between values
Confidence Intervals
- Estimate population parameters from sample statistics using confidence intervals
Hypothesis Testing
- Use critical values to make decisions on whether or not a treatment has changed the value of a population parameter.
T-Tests and A/B Tests
- Test the effect of a treatment or compare the difference in means for two groups when we have small sample sizes
Regression
- Build a linear regression model to understand the relationship between independent and dependent variables.
- Use linear regression results to make a prediction.
Multiple Linear Regression
- Use multiple linear regression results to interpret coefficients for several predictors
Logistic Regression
- Use logistic regression results to make a prediction about the relationship between categorical dependent variables and predictors.
Project: Analyze Experiment Results
- In this project, you will be given a dataset containing data from an experiment. You will use statistical techniques to answer data-related questions and report your findings and recommendations in a report.
Course 3: Data Wrangling
Intro to Data Wrangling
- Identify each step of the data wrangling process (gathering, assessing, and cleaning).
- Wrangle a CSV file downloaded from Kaggle using fundamental gathering, assessing, and cleaning code.
Gathering Data
- Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs.
- Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files.
- Store gathered data in a PostgreSQL database.
Assessing Data
- Assess data visually and programmatically using pandas
- Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues)
- Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity
Cleaning Data
- Identify each step of the data cleaning process (defining, coding, and testing)
- Clean data using Python and pandas
- Test cleaning code visually and programmatically using Python
Project: Wrangle and Analyze Data
- Real-world data is rarely pristine. Using Python, you will collect data from various sources, assess its quality and tidiness, and then clean it. You'll keep track of your wrangling efforts in a Jupyter Notebook and show them off with Python and SQL analyses and visualisations.
Course 4: Data Visualization with Python
Data Visualization in Data Analysis
- Understand why visualization is important in the practice of data analysis.
- Know what distinguishes exploratory analysis from Explanatory analysis, and the role of data visualization in each.
Design of Visualizations
- Interpret features in terms of level of measurement.
- Know different encodings that can be used to depict data in visualizations.
- Understand various pitfalls that can affect the effectiveness and truthfulness of visualizations.
Univariate Exploration of Data
- Use bar charts to depict distributions of categorical variables.
- Use histograms to depict distributions of numeric variables
- Use axis limits and different scales to change how your data is interpreted
Bivariate Exploration of Data
- Use scatterplots to depict relationships between numeric variables.
- Use clustered bar charts to depict relationships between categorical variables
- Use violin and bar charts to depict relationships between categorical and numeric variables
- Use faceting to create plots across different subsets of the data
Multivariate Exploration of Data
- Use encodings like size, shape, and color to encode values of a third variable in a visualization.
- Use plot matrices to explore relationships between multiple variables at the same time.
- Use feature engineering to capture relationships between variables.
Explanatory Visulizations
- Understand what it means to tell a compelling story with data.
- Choose the best plot type, encodings, and annotations to polish your plots.
- Create a slide deck using a Jupyter Notebook to convey your findings.
Visulization Case Study
- Apply your knowledge of data visualization to a dataset involving the characteristics of diamonds and their prices.
Project: Communicate Data Findings
- Real-world data is rarely clean. Using Python, you will collect data from a variety of sources, assess its quality and tidiness, and then clean it. Your wrangling efforts will be documented in a Jupyter Notebook, and they will be showcased through Python and SQL analyses and visualisations.