Description
In this course, you will :
- Learn how to compute basic features such as the number of words, characters, average word length, and special characters (such as Twitter hashtags and mentions).
- You'll also learn how to compute readability scores and calculate the amount of education needed to understand a piece of text.
- Discover the concepts of tokenization and lemmatization.
- Learn how to use the spaCy library to perform text cleaning, part-of-speech tagging, and named entity recognition.
- Learn about n-gram modelling and how to use it to analyse sentiment in movie reviews.
- Discover how to compute the tf-idf weights and the cosine similarity score between two vectors.
- Learn about word embeddings and compute similarities between various Pink Floyd songs using word vector representations.
Syllabus :
- Basic features and readability scores
- Text preprocessing, POS tagging and NER
- N-Gram models
- TF-IDF and similarity scores