Description
In this course, you will :
- takes you through the skill sets required for data science, demonstrates how to visualise data in Java, and investigates various methods of converting data into information
- introduces some fundamental concepts and examples of data science, then walks you through the process of representing data in Java and some potential difficulties.
- explains data manipulation techniques such as mapping, filtering, collecting, and sorting
- describes how to find, collect, clean, manipulate, and store data in order to begin doing useful things with it.
- shows you the fun part: various methods for converting data into information.
- Nearest-Neighbor, Bayes, linear regression, decision trees, clustering, and other techniques are covered.
Syllabus :
1. Data Science Basics
- What is data science anyway?
- Data science examples
- Data as a business asset
- CRISP-DM: The data science cycle
- Types of problems in data science
2. Representing Data in Java
- Data formatting in Java
- More data formatting
- Real-life data difficulties
3. Data Manipulation Techniques
- Mapping
- Filtering
- Collecting
- Sorting
4. Loading Data in Java
- Reducing file size
- Loading data from text files
- Creating a person data class
- Converting strings to data objects
- Loading tab-separated files
- Loading CSVs
- Converting CSVs to data objects
5. Data Visualization with JavaFX
- Setting up JavaFX
- Formatting data for a scatterplot
- Displaying a scatterplot
- Multiple datasets on a scatterplot
- Calculating average MPG
- Displaying a bar chart
6. Modeling and Machine Learning
- Building machine learning models
- Supervised vs. unsupervised learning
- Overfitting and how to avoid it
7. K-Nearest Neighbors (KNN)
- K-nearest neighbor basics
- Loading flower data
- Creating a DataItem interface
- Calculating the closest data points
- Implementing the DataItem interface
- Letting your data points vote
- Finishing your KNN classifier
8. Naive Bayes
- Naive Bayes basics
- Calculating the possible labels
- Splitting your dataset by label
- Calculating mean and standard deviation
- Calculating datapoint probabilities