Want to master your topic faster? Let AI build your personalized course

📚 Smarter courses, 🔍 adaptive quizzes, 🎓 real certificates.

Coursesity is supported by learner community. We may earn affiliate commission when you make purchase via links on Coursesity.

Certification Course

Data Engineering for Data Scientists

Learn how to manage large amounts of data! You will be able to gather data from a variety of sources, store it in a database, and develop data pipelines (ETL, NLP, machine learning) that power real-world online apps by the end of this course.

In this course, you will learn :

About ETL pipelines and how to use them to process and mix data from CSV, JSON, logs, APIs, and databases.
Tokenize, lemmatize, and remove stop words from text data before analysing it. Using bag of words and tf-idf, transform and vectorize text data and generate features with scikit-learn.
About the benefits of employing machine learning pipelines to speed up the data preparation and modelling process. To develop a whole machine learning pipeline that prepares data and creates a model for a dataset, use feature unions to perform steps in parallel and create more complicated workflows.

Syllabus :