Description
Learn cutting-edge computer vision and deep learning techniques, from basic image processing to convolutional neural networks construction and customization. Apply these ideas to vision tasks like automatic image captioning and object tracking, and you'll have a solid portfolio of computer vision projects to show for it.
Syllabus:
Course 1: Introduction to Computer Vision
Introduction to Computer Vision
- Learn where computer vision techniques are used in industry.
- Prepare for the course ahead with a detailed topic overview.
- Start programming your own applications!
Image Representation and Analysis
- See how images are represented numerically.
- Implement image processing techniques like color and geometric transforms.
- Program your own convolutional kernel for object edge-detection.
Convolutional NN Layers
- Learn about the layers of a deep convolutional neural network: convolutional, maxpooling, and fully-connected layers.
- Build an CNN-based image classifier in PyTorch.
- Learn about layer activation and feature visualization techniques.
Features and Object Recognition
- Learn why distinguishing features are important in pattern and object recognition tasks.
- Write code to extract information about an object’s color and shape.
- Use features to identify areas on a face and to recognize the shape of a car or pedestrian on a road.
Image Segmentation
- Implement k-means clustering to break an image up into parts.
- Find the contours and edges of multiple objects in an image.
- Learn about background subtraction for video.
Project: Facial Keypoint Detection
Image processing and deep learning techniques can be used to detect faces in images and identify facial keypoints such as the position of the eyes, nose, and mouth on a face.
This project will put your knowledge of image processing and feature extraction techniques to the test, allowing you to represent various facial features programmatically. You'll also apply your deep learning knowledge to train a convolutional neural network to recognise facial keypoints. Facial keypoints are points on any face around the eyes, nose, and mouth that are used in a variety of applications ranging from facial tracking to emotion recognition.
Course 2: Advanced Computer Vision and Deep Learning
Advanced CNN Architecture
- Learn about advances in CNN architectures.
- See how region-based CNN’s, like Faster R-CNN, have allowed for fast, localized object recognition in images.
- Work with a YOLO/single shot object detection system
Recurrent Neural Networks
- Learn how recurrent neural networks learn from ordered sequences of data.
- Implement an RNN for sequential text generation.
- Explore how memory can be incorporated into a deep learning model.
- Understand where RNN’s are used in deep learning applications.
Attention Mechanisms
- Learn how attention allows models to focus on a specific piece of input data.
- Understand where attention is useful in natural language and computer vision applications.
Image Captioning
- Learn how to combine CNNs and RNNs to build a complex captioning model.
- Implement an LSTM for caption generation.
- Train a model to predict captions and understand a visual scene.
Project: Automatic Image Captioning
Combine CNN and RNN expertise to create a deep learning model that generates captions based on an input image.
Image captioning necessitates the development of a complex deep learning model comprised of two components: a CNN that converts an input image into a set of features and an RNN that converts those features into rich, descriptive language. In this project, you will put these cutting-edge deep learning architectures into action.
Course 3: Object Tracking and Localization
Object Motion and Tracking
- Learn how to programmatically track a single point over time.
- Understand motion models that define object movement over time.
- Learn how to analyze videos as sequences of individual image frames.
Optical Flow and Feature Matching
- Implement a method for tracking a set of unique features over time.
- Learn how to match features from one image frame to another.
- Track a moving car using optical flow.
Robot Localization
- Use Bayesian statistics to locate a robot in space.
- Learn how sensor measurements can be used to safely navigate an environment.
- Understand Gaussian uncertainty.
- Implement a histogram filter for robot localization in Python.
Graph Slam
- Identify landmarks and build up a map of an environment.
- Learn how to simultaneously localize an autonomous vehicle and create a map of landmarks.
- Implement move and sense functions for a robotic vehicle.
Project: Landmark Detection and Tracking
Using SLAM (simultaneous localization and mapping), create a map of the environment using feature detection and keypoint descriptors.
Use probability, motion models, and linear algebra to develop a robust method for tracking an object over time. This project will put your knowledge of localization techniques to the test, which are commonly used in autonomous vehicle navigation.