Description
You will learn the skills required to be a successful Data Scientist. Working on projects designed by industry experts, you'll learn how to run data pipelines, design experiments, build recommendation systems, and deploy solutions to the cloud.
Syllabus:
Course 1: Solving Data Science Problems
The Data Science Process
- Apply the CRISP-DM process to business applications
- Wrangle, explore, and analyze a dataset
- Apply machine learning for prediction
- Apply statistics for descriptive and inferential understanding
- Draw conclusions that motivate others to act on your results
Communicating with Stakeholders
- Implement best practices in sharing your code and written summaries
- Learn what makes a great data science blog
- Learn how to create your ideas with the data science community
Project: Write a Data Science Blog Post
You will select a dataset, identify three questions, and analyse the data to find answers to these questions in this project. To communicate your findings to the appropriate audience, you will create a GitHub repository for your project and write a blog post. This project will assist you in reinforcing and expanding your knowledge of machine learning, data visualisation, and communication.
Course 2: Software Engineering for Data Scientists
Software Engineering Practices
- Write clean, modular, and well-documented code
- Refactor code for efficiency • Create unit tests to test programs
- Write useful programs in multiple scripts • Track actions and results of processes with logging
- Conduct and receive code reviews
Object-Oriented Programming
- Understand when to use object oriented programming
- Build and use classes
- Understand magic methods
- Write programs that include multiple classes, and follow good code structure
- Learn how large, modular Python packages, such as pandas and scikit-learn, use object oriented programming
- Portfolio Exercise: Build your own Python package
Web Development
- Learn about the components of a web app
- Build a web application that uses Flask, Plotly, and the Bootstrap framework
- Portfolio Exercise: Build a data dashboard using a dataset of your choice and deploy it to a web application
Course 3: Data Engineering for Data Scientists
ETL Pipelines
- Understand what ETL pipelines are
- Access and combine data from CSV, JSON, logs, APIs, and databases
- Standardize encodings and columns
- Normalize data and create dummy variables
- Handle outliers, missing values, and duplicated data
- Engineer new features by running calculations
- Build a SQLite database to store cleaned data
Natural Language Processing
- Prepare text data for analysis with tokenization, lemmatization, and removing stop words
- Use scikit-learn to transform and vectorize text data
- Build features with bag of words and tf-idf
- Extract features with tools such as named entity recognition and part of speech tagging
- Build an NLP model to perform sentiment analysis
Machine Learning Pipelines
- Understand the advantages of using machine learning pipelines to streamline the data preparation and modeling process
- Chain data transformations and an estimator with scikit- learn’s Pipeline
- Use feature unions to perform steps in parallel and create more complex workflows
- Grid search over pipeline to optimize parameters for entire workflow
- Complete a case study to build a full machine learning pipeline that prepares data and creates a model for a dataset
Project: Build Disaster Response Pipelines with Figure Eight
Figure Eight (formerly Crowdflower) used crowdsourcing to tag and translate messages in order to apply artificial intelligence to disaster relief. In this project, you will create a data pipeline to prepare message data from major natural disasters worldwide. You will create a machine learning pipeline to categorise emergency text messages based on the sender's expressed need.
Course 4: Experiment Design and Recommendations
Experiment Design
- Understand how to set up an experiment, and the ideas associated with experiments vs. observational studies
- Defining control and test conditions
- Choosing control and testing groups
Statistical Concerns of Experimentation
- Applications of statistics in the real world
- Establishing key metrics
- SMART experiments: Specific, Measurable, Actionable, Realistic, Timely
A/B Testing
- How it works and its limitations
- Sources of Bias: Novelty and Recency Effects
- Multiple Comparison Techniques (FDR, Bonferroni, Tukey) • Portfolio Exercise: Using a technical screener from Starbucks to analyze the results of an experiment and write up your findings
Introduction to Recommendation Engines
- Distinguish between common techniques for creating recommendation engines including knowledge based, content based, and collaborative filtering based methods.
- Implement each of these techniques in python.
- List business goals associated with recommendation engines, and be able to recognize which of these goals are most easily met with existing recommendation techniques.
Matrix Factorization for Recommendations
- Understand the pitfalls of traditional methods and pitfalls of measuring the influence of recommendation engines under traditional regression and classification techniques.
- Create recommendation engines using matrix factorization and FunkSVD
- Interpret the results of matrix factorization to better understand latent features of customer data
- Determine common pitfalls of recommendation engines like the cold start problem and difficulties associated with usual tactics for assessing the effectiveness of recommendation engines using usual techniques, and potential solutions.
Project: Design a Recommendation Engine with IBM
Members of IBM's online data science community can share tutorials, notebooks, articles, and datasets. In this project, you will create a recommendation engine in IBM Watson Studio's data platform based on user behaviour and social networks to surface content that is most likely to be relevant to a user.
Course 5: Data Science Projects
Elective 1: Dog Breed Classification
- Use convolutional neural networks to classify different dogs according to their breeds
- Deploy your model to allow others to upload images of their dogs and send them back the corresponding breeds.
- Complete one of the most popular projects in Udacity history, and show the world how you can use your deep learning skills to entertain an audience!
Elective 2: Starbucks
- Use purchasing habits to arrive at discount measures to obtain and retain customers
- Identify groups of individuals that are most likely to be responsive to rebates.
Elective 3: Arvato Financial Services
- Work through a real-world dataset and challenge provided by Arvato Financial Services, a Bertelsmann company
- Top performers have a chance at an interview with Arvato or another Bertelsmann company!
Elective 4: Spark for Big Data
- Take a course on Apache Spark and complete a project using a massive, distributed dataset to predict customer churn
- Learn to deploy your Spark cluster on either AWS or IBM Cloud
Elective 5: Your Choice
- Use your skills to tackle any other project of your choice
Project: Data Science Capstone Project
You will use what you've learned throughout the programme to create a data science project of your choice for this capstone project. You will define the problem you want to solve, identify and explore the data, then conduct your analyses and draw conclusions. You will present your findings and analysis in a blog post and a GitHub repository.