Aman's AI Journal • CS231n: Convolutional Neural Networks for Visual Recognition

CS231n: Convolutional Neural Networks for Visual Recognition

A distilled compilation of my notes for Stanford's CS231n: Convolutional Neural Networks for Visual Recognition.
Stanford's CS231n is one of the best ways to dive into the fields of AI/Deep Learning, and in particular, into Computer Vision. If you plan to excel in another subfield of AI (say, Natural Language Processing or Reinforcement Learning), we still recommend that you start with CS231n, because it helps build intuition, fundamental understanding and hands-on skills.

Notes

Introduction to CNNs for Visual Recognition

computer vision overview; historical context; course logistics

Image Classification

the data-driven approach; k-nearest neighbor; linear classification I

Loss Functions

linear classification II; higher-level representations, image features

Optimization

optimization, stochastic gradient descent

Neural Networks and Backpropagation

backpropagation; multi-layer perceptrons; the neural viewpoint

Convolutional Neural Networks

history; convolution and pooling; convnets outside vision

Deep Learning Hardware and Software

CPUs, GPUs, TPUs; PyTorch, TensorFlow; dynamic vs static computation graphs

Training Neural Networks I

activation functions; data processing; batch normalization; transfer learning

Training Neural Networks II

update rules; hyperparameter tuning; learning rate scheduling; data augmentation

CNN Architectures

AlexNet, VGG, GoogLeNet, ResNet, etc.

Recurrent Neural Networks

RNN, LSTM; language modeling; image captioning; vision + language; attention

Generative Models

PixelRNN/PixelCNN; variational auto-encoders; generative adversarial networks

Detection and Segmentation

semantic segmentation; object detection; instance segmentation

Visualizing and Understanding

feature visualization and inversion; adversarial examples; DeepDream and style transfer

Course Info

Course website

Lectures - Winter 2016

Lectures - Spring 2017

Full syllabus

Course description:

Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems.
This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment involves training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet).
We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.

Credits

The in-line diagrams are taken from the CS231n lecture slides, unless specified otherwise. Reproduced with permission.

Citation

If you found our work useful, please cite it as:

@misc{Chadha2020DistilledNotesCS231n,
  author        = {Chadha, Aman},
  title         = {Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition},
  howpublished  = {\url{https://www.aman.ai}},
  year          = {2020},
  note          = {Accessed: 2020-07-01},
  url           = {www.aman.ai}
}

A. Chadha, Distilled Notes for Stanford CS231n: Convolutional Neural Networks for Visual Recognition, https://www.aman.ai, 2020, Accessed: July 1 2020.