CS231n: Convolutional Neural Networks for Visual Recognition

A distilled compilation of my notes for Stanford's CS231n: Convolutional Neural Networks for Visual Recognition.
Stanford's CS231n is one of the best ways to dive into the fields of AI/Deep Learning, and in particular, into Computer Vision. If you plan to excel in another subfield of AI (say, Natural Language Processing or Reinforcement Learning), we still recommend that you start with CS231n, because it helps build intuition, fundamental understanding and hands-on skills.
Introduction to CNNs for Visual Recognition
computer vision overview; historical context; course logistics
Image Classification
the data-driven approach; k-nearest neighbor; linear classification I
Loss Functions and Optimization
linear classification II; higher-level representations, image features; optimization, stochastic gradient descent
Neural Networks and Backpropagation
backpropagation; multi-layer perceptrons; the neural viewpoint
Convolutional Neural Networks
history; convolution and pooling; convnets outside vision
Deep Learning Hardware and Software
CPUs, GPUs, TPUs; PyTorch, TensorFlow; dynamic vs static computation graphs
Training Neural Networks I
activation functions; data processing; batch normalization; transfer learning
Training Neural Networks II
update rules; hyperparameter tuning; learning rate scheduling; data augmentation
CNN Architectures
AlexNet, VGG, GoogLeNet, ResNet, etc.
Recurrent Neural Networks
RNN, LSTM; language modeling; image captioning; vision + language; attention
Generative Models
PixelRNN/PixelCNN; variational auto-encoders; generative adversarial networks
Detection and Segmentation
semantic segmentation; object detection; instance segmentation
Visualizing and Understanding
feature visualization and inversion; adversarial examples; DeepDream and style transfer
Deep Reinforcement Learning
policy gradients; hard attention; Q-Learning; Actor-Critic
Scene Graphs
visual relationships; graph neural networks
Course Info
Course description:
  • Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems.
  • This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment involves training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet).
  • We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.
The in-line diagrams are taken from the CS231n lecture slides, unless specified otherwise.
If you found our work useful, please cite it as:
author = {Aman Chadha},
title = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
howpublished = {\url{https://www.aman.ai}},
year = {2020},
note = {Accessed: 2020-07-01},
url = {www.aman.ai}

A. Chadha, Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition, https://www.aman.ai, 2020, Accessed: July 1 2020.