Description
In this course, you will :
- Data mining is the branch of data science that focuses on discovering actionable patterns in large and diverse datasets: clusters of similar customers, long-term trends that can only be identified after disentangling seasonal and random effects, and new methods for predicting important outcomes.
- focuses on data mining in R, presents a wide range of algorithms, including machine learning methods, and provides critical information on data mining laws and policies.
- provides a high-level overview of dimensionality reduction
- introduces clustering, including hierarchical clustering, before moving on to association analysis
- Describes time-series mining and decomposition before concluding with text mining, sentiment analysis, and sentiment scoring.
Syllabus :
1. Preliminaries
- Tools for data mining
- The CRISP-DM data mining model
- Privacy, copyright, and bias
- Validating results
2. Dimensionality Reduction
- Dataset: Handwritten digits
- PCA
- LDA
- t-SNE
3. Clustering
- Dataset: Penguins
- Hierarchical clustering
- K-means
- DBSCAN
4. Classification
- Dataset: Spambase
- K-nn
- Naive Bayes
- Decision trees
5. Association Analysis
- Dataset: Groceries
- Apriori
- Eclat
- CBA
6. Time-Series Mining
- Dataset: AirPassengers
- Time-series decomposition
- ARIMA
- MLP
7. Text Mining
- Dataset: The Iliad
- Sentiment analysis: Binary classification
- Sentiment analysis: Sentiment scoring
- Visualizing Word pairs