In this course, you will :
- Introduced to the problem that you will be tackling in this course How do you accurately classify line items in a school budget based on their intended use? You will investigate the dataset's raw text and numeric values quantitatively and visually. You'll also learn how to assess your success when attempting to predict class labels for each row of the dataset.
- You will create a first-pass model. You will only use numerical data to train the model. Spoiler alert: discarding all of the text data is detrimental to performance! You will, however, learn how to format your predictions.
- Natural language processing (NLP) was introduced in order to begin working with the large amounts of text in the data.
- Learn how to create pipelines that process various types of data.
- You'll also learn how the pipeline workflow's flexibility makes testing different approaches efficient, even in complex problems like this one!
- Learn the tricks used by the competition winner and use scikit-learn to implement them yourself.
- Exploring the raw data
- Creating a simple first model
- Improving your model
- Learning from the experts