Description
In this course, you will learn :
- It is critical to understand how to deal with unstructured text data.
- assists you in developing your text mining skill set by covering key techniques for extracting, cleansing, and processing text in Python.
- Kumaran discusses fundamental text processing concepts such as tokenization and stemming.
- n-grams and TF-IDF are two techniques for converting text into analytics-ready form.
- Python and the NLTK library are used to demonstrate these techniques.
Syllabus :
1. Text Mining
- Text mining today
- Document concepts
- Corpus concepts
- Introduction to the NLTK library
- Setting up the environment
2. Reading Text
- Reading raw files
- Reading files with corpus reader
- Exploring the corpus
- Analyzing the corpus
3. Text Cleansing and Extraction
- Tokenization
- Cleansing text
- Stop word removal
- Stemming
- Lemmatization
4. Advanced Text Processing
- Building n-grams
- Tagging parts of speech
- Term frequency-inverse document frequency (TF-IDF)
- Building a TF-IDF matrix
5. Best Practices
- Storing text
- Processing text data
- Scalable processing of text data