Description
In this course, you will learn:
- How to use leading machine-learning techniques—cluster analysis, anomaly detection, and association rules—to get accurate, meaningful results from big data
Syllabus:
- Introduction
- Welcome
- What you should know
- Using the exercise files
- What is unsupervised machine learning?
1. What Is Cluster Analysis?
- Looking at the data with a 2D scatter plot
- Understanding hierarchical cluster analysis
- Running hierarchical cluster analysis
- Interpreting a dendrogram
- Methods for measuring distance
- What are k-nearest neighbors?
2. K-Means
- How does k-means work?
- Which variables should be used with k-means?
- Interpreting a box plot
- Running a k-means cluster analysis
- Interpreting cluster analysis output
- What does silhouette mean?
- Which cases should be used with k-means?
- Finding optimum value for k: k = 3
- Finding optimum value for k: k = 4
- Finding optimum value for k: k = 5
- What the best solution?
3. Visualizing and Reporting Cluster Solutions
- Summarizing cluster means in a table
- Traffic Light feature in Excel
- Line graphs
4. Cluster Methods for Categorical Variables
- Relating clusters to categories statistically
- Relating clusters to categories visually
- Running a multiple correspondence analysis
- Interpreting a perceptual map
- Using cluster analysis and decision trees together
- A BIRCH/two-step example
- A self-organizing map example
5. Anomaly Detection
- The k = 1 trick
- Anomaly detection algorithms
- Using SOM for anomaly detection
6. Association Rules and Sequence Detection
- Intro to association rules and sequence analysis
- Running association rules
- Some association rules terminology
- Interpreting association rules
- Putting association rules to use
- Comparing clustering and association rules
- Sequence detection