Read List
- A curated set of books, newsletters, blogs, Github repos, and other web resources I’d recommend to build intuition around AI concepts.
Course Notes
- Notes that accompany the Stanford class CS231n: Convolutional Neural Networks for Visual Recognition.
- Notes that accompany the Stanford class CS224n: Natural Language Processing with Deep Learning.
- Notes that accompany the Stanford class CS230: Deep Learning.
- Notes that accompany the Stanford class CS229: Machine Learning.
- Notes that accompany the Stanford class CS236: Deep Generative Models.
- Notes that accompany the Stanford class CS329s: Machine Learning Systems Design.
- Notes that accompany the Stanford class CS131 Computer Vision: Foundations and Applications. Also, github with TeX source.
- Slides that accompany the CMU class 11-777 Multimodal Machine Learning.
- Slides that accompany the CMU class 11-877 Advanced Topics in Multimodal Machine Learning.
- Notes that accompany MIT’s 6.034 Artificial Intelligence.
- Natural Language Processing course taught at the Yandex School of Data Analysis (YSDA) by Lena Voita, a Research Scientist at Facebook AI Research.
- This repository contains the Deep Reinforcement Learning Course by Hugging Face by Thomas Simonini and Omar Sanseviero.
- You’ll learn about Deep Q-Learning, Policy Gradient, Actor-Critic Methods, Proximal Policy Optimization, and more!
- A course by Hugging Face on applying Transformers to various tasks in natural language processing and beyond.
- This repository contains the Diffusion Models course by Hugging Face.
- This course by Hugging Face explores how transformers can be applied to audio data.
- You’ll learn how to use transformers to tackle a range of audio-related tasks. Whether you are interested in speech recognition, audio classification, or generating speech from text (text-to-speech), this course has you covered.
- This course by Hugging Face explores NLP using libraries from the Hugging Face ecosystem — 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate — as well as the Hugging Face Hub.
Books
- Intended to help students and practitioners enter the field of machine learning in general and deep learning in particular.
- Gain an intuitive understanding of the concepts and tools for building intelligent systems using a range of techniques, starting with simple linear regression and progressing to deep neural networks.
- The first textbook on pattern recognition to present approximate inference algorithms that enable fast approximate answers in situations where exact answers are not feasible.
- One of the most widely used textbooks on reinforcement learning. Covers Markov decision processes and reinforcement learning.
- Book page
- The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this book takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations. The newest 3rd edition covers transformers, BERT, and fine-tuning etc.
- For data analysts who lean towards a machine learning engineering role, and machine learning engineers alike who want to bring more structure to their work.
- All you need to know about machine learning in a hundred pages.
- A book on machine learning interviews by Chip Huyen, Stanford Lecturer and Snorkel AI. Blog post.
- This booklet covers four main steps of designing a machine learning system:
- Project setup
- Data pipeline
- Modeling: selecting, training, and debugging
- Serving: testing, deploying, and maintaining
- It comes with links to practical resources that explain each aspect in more details. It also suggests case studies written by machine learning engineers at major tech companies who have deployed machine learning systems to solve real-world problems.
- At the end, the booklet contains 27 open-ended machine learning systems design questions that might come up in machine learning interviews. The answers for these questions are in the book Machine Learning Interviews.
- A book on deep learning interviews by Shlomo Kashani and Amir Ivry. The book provides hundreds of questions and their solutions covering some core topics of deep learning:
- Logistic regression
- Probabilistic programming and Bayesian inference
- Deep learning math (calculus, algorithmic differentiation)
- Neural network ensembles
- Convolutional Neural Networks (CNNs)
- Job interview questions
- A good resource for students preparing for a job interview in the domain of deep learning or folks who are looking for a concise summary of DL topics.
- The PDF version of the book is available for free online. A hard copy version of the book is available for purchase online.
- Great material for preparing for an NLP job interview by Steve Nouri.
- Interactive deep learning book from folks at Amazon, CMU, ETH Zürich, and Google with code, math, and discussions.
- Learn the mathematical concepts behind ML. This book by Deisenroth, Faisal, and Ong is not intended to cover advanced machine learning techniques but instead aims to provide the necessary mathematical skills to read those other books.
- Another mathematical foundations building book by Mohri, Rostamizadeh, and Talwalkar from NYU.
- A detailed introduction to machine learning (including deep learning) from Kevin Murphy from UBC and Google, through the unifying lens of probabilistic modeling and Bayesian decision theory. This book covers mathematical background (including linear algebra and optimization), basic supervised learning (including linear and logistic regression and deep neural networks), as well as more advanced topics (including transfer learning and unsupervised learning).
- Great walk-through of core ideas relevant to Computational Engineering and Scientific Computing using Python by Hans Fangohr, University of Southampton.
- This report by Jonathan Shewchuk contains lecture notes for UC Berkeley’s introductory class on Machine Learning.
- This text by Michael Evans and Jeffrey Rosenthal from UofT is an introductory text on probability and statistics, targeting students who have studied one year of calculus at the university level and are seeking an introduction to probability and statistics with mathematical content.
- A textbook in preparation for an introductory undergraduate course on theoretical computer science by Boaz Barak, Harvard.
- This text by Christoph Rasche, Polytechnic University of Bucharest, is a dense introduction to the field of computer vision. It covers all three approaches, the classical engineering approach based on contours and regions; the local-features approach; and the Deep Learning approach.
- These notes composed by Hadar Sharvit cover a wide range of Deep Learning topics in a really well-written and accessible way. The linked Github repository also contains course exercises.
- Do you want to learn statistics in an fun and interesting way? The Cartoon Guide to Statistics covers all the central ideas of modern statistics: the summary and display of data, probability in gambling and medicine, random variables, Bernoulli Trails, the Central Limit Theorem, hypothesis testing, confidence interval estimation, and much more—all explained in simple, clear, and yes, funny illustrations.
- The Bayes’ theorem, named after the 18th-century British mathematician Thomas Bayes, is a mathematical formula for figuring out the conditional probability of an event. In essence, Bayes’ theorem calculates a probability of an event based upon prior knowledge of the conditions that may affect it. Bayes’ theorem forms the basis of a statistical inference approach called Bayesian inference. This intro by Dan Morris is a great way to get a grasp of this incredible piece of math!
- Covers the principles of deep learning, motivation, explanations, state-of-the-art papers for the various tasks and architectures: data pre-processing, weight initialization, activation functions, loss functions, optimization, regularization, convolutional neural networks, object detection, semantic segmentation, generative models, denoising, super resolution, style transfer and style manipulation, inpainting, self supervised learning, vision transformers,
OCR, multi-modal.
- Great survey on synthetic data generation paper by Sergey Nikolenko from Synthesis.ai and Steklov Institute of Mathematics at St. Petersburg, Russia.
- Synthetic data is an increasingly popular tool for training deep learning models, especially in computer vision but also in other areas. This work attempts to provide a comprehensive survey of the various directions in the development and application of synthetic data. The discussion centers around synthetic datasets for basic computer vision problems, both low-level (e.g., optical flow estimation) and high-level (e.g., semantic segmentation), synthetic environments and datasets for outdoor and urban scenes (autonomous driving), indoor scenes (indoor navigation), aerial navigation, simulation environments for robotics, applications of synthetic data outside computer vision (in neural programming, bioinformatics, NLP, etc.). Further, the synthetic-to-real domain adaptation problem is discussed that inevitably arises in applications of synthetic data, including syntheticto-real refinement with GAN-based models and domain adaptation at the feature/model level without explicit data transformations. Finally, privacy-related applications of synthetic data are discussed including generating synthetic datasets with differential privacy guarantees.
- This book by Christoph Molnar will enable you to select and correctly apply the interpretation method that is most suitable for your machine learning project.
- This project contains an overview of recent trends in deep learning based natural language processing (NLP). It covers the theoretical descriptions and implementation details behind deep learning models, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and reinforcement learning, used to solve various NLP tasks and applications. The overview also contains a summary of state of the art results for NLP tasks such as machine translation, question answering, and dialogue systems.
- Daniel and Michael from Unsloth identified and fixed several bugs in Google Gemma, including issues with token addition, mathematical inaccuracies, and precision problems in machine learning models.
- Answer.AI details an open source system, based on FSDP and QLoRA, that can train a 70b model on two consumer 24GB gaming GPUs.
- OPT-175B logbook covering a series of notes written to summarize the process and communicate some of the challenges Meta faced along the way while training the model.
- Related: Chronicles of OPT development
- The blog post from Yi Tay from Reka AI discusses his experience training large language and multimodal models outside of a major tech company, highlighting the unpredictable quality of computing hardware, challenges in infrastructure, and adaptations in code and model scaling strategies.
- It emphasizes the “hardware lottery” in acquiring compute resources, the need for custom solutions for efficient training, and the pragmatic approach to model development in a startup environment.
- The blog post from implements picoGPT and flexes some of the benefits of JAX: (i) trivial to port Numpy using
jax.numpy
, (ii) get gradients, and (iii) batch with jax.vmap
. It also inferences GPT-2 checkpoints.
Newsletters
- The Batch is a weekly newsletter from deeplearning.ai which presents the most important AI events and perspective in a curated, easy-to-read report for engineers, and business leaders.
- Every Wednesday, The Batch highlights a mix of the most practical research papers, industry-shaping applications, and high-impact business news.
- The most important artificial intelligence and machine learning links of the week.
- The Gradient is a digital magazine that aims to be a place for discussion about research and trends in artificial intelligence and machine learning.
- Latest updates on NLP readings, research, and more!
- Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets.
- A weekly newsletter about artificial intelligence based on detailed analysis of cutting-edge research by Jack Clark, a co-founder of Anthropic, an AI safety and research company.
- The AiEdge Newsletter by Damien Benveniste, ex-Team Lead at Meta seeks to explain complex ML topics in simple terms by breaking them down into nuggets and offering practical wisdom from the field. The highlight of this newsletter is that it addresses subjects that tend to be less mainstream and not found in typical textbooks.
Interactive visualizations
- ConvNetJS is a Javascript library for training neural networks entirely in your browser. Open a tab and you’re training. No software requirements, no compilers, no installations, no GPUs, no sweat.
- Notes that supplement the Coursera Deep Learning Specialization. With interactive visualizations, these tutorials will help you build intuition about foundational deep learning concepts.
- OpenAI Microscope is a collection of visualizations of every significant layer and neuron of eight vision “model organisms” which are often studied in interpretability. Microscope makes it easier to analyze the features that form inside these neural networks.
- A visual introduction to probability and statistics. Also, includes a textbook called “Seeing Theory”.
- Pandas Tutor lets you write Python pandas code in your browser and see how it transforms your data step-by-step.
- A visual introduction to machine learning by Stephanie Yee and Tony Chu, offers a way to learn machine learning through interactive visualization. The visualizations are stunning, and the explanation is intuitive.
- Machine Learning University (MLU) is an education initiative from Amazon designed to teach machine learning theory and practical application. As part of that goal, MLU-Explain exists to teach core machine learning concepts through visual essays in a fun, informative, and accessible manner.
- An interactive matrix multiplication calculator for educational purposes.
Blogs
Company Blogs
- The official AI Research blog of Meta AI.
- Recommended articles:
- The official AI Research blog of Amazon Science.
- Recommended articles:
- The official AI Research blog of Apple.
- Recommended articles:
- The official AI Research blog of Google.
- Recommended articles:
- The official AI Research blog of Google DeepMind.
- Recommended articles:
- The official AI Research blog of Microsoft.
- The official AI Research blog of NVIDIA.
- The official AI Research blog of Cohere.
- Recommended articles:
- The Facebook Engineering Blog offers a highly technical digest on what’s going on in the software engineering world. From developer tools and popular platforms to infrastructure and artificial intelligence, the Facebook Engineering blog covers a wide range of topics and gives insights into how Facebook solves large-scale technical challenges.
- Recommended article:
- The Instagram Engineering blog is mostly specific to Instagram services, there are some posts that cover more broadly implemented topics.
- Recommended article:
- Uber Engineering may be a little bit different from the other tech communities on this list, given that its service is felt more “in the real world” than strictly online. It can result in some interesting posts on the team’s blog, describing logistic issues from a tech perspective. They also have an interesting “Profiles in Coding” section, where you can read about the engineers, not just the software.
- Recommended articles:
- Netflix is undoubtedly one of the most popular streaming services. A lot of different technical processes are going on under the hood: great UI and UX experience, a huge and uninterrupted movie database, and – the most exciting – a machine-learning model that powers movie recommendations. From the Netflix Tech blog, you’ll learn about all these topics in-depth. It’s always helpful to know how a seamless client service experience was achieved, and this blog covers that.
- Recommended articles:
- Airbnb is a prominent startup success story, so their engineering blog is definitely worth checking out. It could be more technical, but soft skills are also important in the world of tech. They also have a Fintech section; the insights from the payments team are generally useful.
- Each social media platform has its own technical issues, and Twitter is not an exception. Still, as one of the most active social media platforms, it has to run as smoothly as possible. This blog shares how the Twitter engineering team tackles problems, but some posts are more broadly useful.
- Recommended article:
- Quora is a question-and-answer website – ask any question you’ve ever wondered about, and another user will respond.
- What was exciting to learn was that the Quora team builds Quora from the ground up, from backend infrastructure to ranking algorithms to frontend abstractions. The Engineering at Quora blog is fully devoted to the issues the team faces on both the front- and backend.
- Yelp is a crowd-sourced local business review and social networking resource. Its engineering blog is a great source of knowledge for those who want to learn more about Yelp functionality and some of Yelp’s troubleshooting methods. However, it also raises some rather broad questions about general machine learning, discusses hackathons, cloud, and API issues.
- LinkedIn is a unique professional social network, with a rather diverse blog. It contains a wide range of content, from the expected LinkedIn website-related problem-solving topics to more general concepts, all polished and deeply detailed.
- The Spotify blog offers an engaging and easy writing style along with a wide range of topics. Most of them are related to Spotify and all the magic behind one of the most popular music apps. But still, it covers a lot of general cloud, API, and machine-learning subjects, and videos and podcasts break up the more standard blog posts.
- The Salesforce Developer blog is devoted to backend and testing topics. Many large companies use Salesforce, so it’s vital to stay up to date with all the new products and tools it releases. If you’re not already using Salesforce, remember that learning new technologies broadens your mind personally and professionally.
- As you’re likely aware, GitHub is one of the most popular hosting sites for software development and version control using Git. Their blog has a special section for engineering posts which covers mostly GitHub workflow topics and related issues, however, since GitHub is broadly implemented in many tech companies, it can be considered as a generally useful blog.
- Pinterest, the first visual discovery engine, is a creative website, and their blog lives up to the theme. You’ll find a lot of articles on architecture and infrastructure, design, and UX, as well as insights into what it’s like to work for Pinterest.
- Mixedbread.ai details their embeddings and reranking models, going over the thought process behind key design decisions.
- Recommended articles:
Personal blogs
- Senior Scientist at Amazon who writes on ideas in machine learning, RecSys, and LLMs.
- Some of Eugene’s famous blog posts:
- OpenAI machine learning researcher who likes to understand things clearly, and explain them well.
- Author of the wildly popular “Understanding LSTM Networks” post.
- Robotics researcher @ OpenAI documenting her learning notes.
- Blog posts that focus on visualizing machine learning one concept at a time.
- Some of Jay’s famous blog posts:
- Aman Arora’s blog on computer vision and NLP related articles.
- Mohit Mayank’s guide book on data science for busy and equally lazy Data Scientists!
- Ravin Kumar’s GenAI guide that offers a guided tour of the fundamentals, the minimal set of resources needed to get a working understanding, and deep-dives into libraries, topics and papers.
- NLP researcher writing on topics that deal with transforming natural language into structured data.
- First-year Computer Science Masters student at Stanford University writes on his experiences with AI/ML.
- UCLA CS ‘19 grad writes on AI/ML.
- A modern medium for presenting research that showcases AI/ML concepts in clear, dynamic and vivid form.
- Join Daniel Tunkelang on his journey from characters to words to phrases, and ultimately to meaning. A perfect companion to decipher the art of query understanding and the role it plays in the search process.
- Nathan Lambert’s blog covering important ideas in AI and how they integrate with society.
Github repositories
- A curated list of deep learning resources for computer vision.
- A curated list of resources dedicated to NLP.
- A list of awesome demos and articles about the OpenAI GPT-3 API.
- A list of references for machine learning operations (MLOps).
- A list of tools for machine learning operations (MLOps).
- A curated list of multimodal LLM resources.
- Curated papers, articles, and blogs on data science & machine learning in production by Eugene Yan.
- A list of 100 important NLP papers that students and researchers working in the field should read.
- PlotNeuralNet generates LaTeX code for drawing neural networks for publications and presentations.
- NN-SVG generates SVGs for neural net architecture schematics.
- Statistics of acceptance rate for the major AI conferences.
- Reading list for the CMU class 11-777 Multimodal Machine Learning.
- Curated set of links to good explanations of how LLMs work.
- System Design tidbits to help you prepare for system design interviews by explaining complex systems using visuals and simple terms.
Web Resources
AI
- The latest AI/ML papers, with code and leaderboards comparing implementations in several Computer Vision and NLP sub-tasks.
- Countdowns to top CV/NLP/ML/Robotics/AI conference deadlines.
- A much lighter-weight arxiv-sanity from-scratch re-write. Periodically polls arxiv API for new papers. Then allows users to tag papers of interest, and recommends new papers for each tag based on SVMs over tfidf features of paper abstracts. Allows one to search, rank, sort, slice and dice these results in a pretty web UI.
- Lastly, arxiv-sanity-lite can send you daily emails with recommendations of new papers based on your tags. Curate your tags, track recent papers in your area, and don’t miss out!
- alphaXiv is a forum for anyone to comment directly on top of arXiv papers.
- ar5iv serves arXiv articles as responsive web pages.
- Replace the “X” of arXiv in any article link to the “5” of ar5iv and be redirected to a corresponding ar5iv page.
- Find trending research papers easily to see the most popular papers on Twitter.
- A collection of simple PyTorch implementations of popular papers documented with explanations, which the website renders as side-by-side formatted notes.
- Cutting-edge Computer Science and AI Papers.
- Offers two courses:
- Foundations: Learn the foundations of machine learning through intuitive explanations, clean code and visualizations.
- MLOps: Learn how to combine machine learning with software engineering to build production-grade applications.
- Paper Digest is an AI-Powered Research Platform to stay current with the latest tech trends in AI, offering top-level highlights of arXived papers.
- A repository of linked datasets that have been published in the Linked Data format.
- ANN-Benchmarks is a benchmarking repository for approximate nearest neighbor algorithms (such as FAISS, ScaNN, ANNOY, PGVector, etc.) search that spans various commonplace datasets, distance measures, and algorithms.
- Connected Papers is a visual tool that helps researchers find relevant papers by creating graphs based on paper similarity, not just citations.
- It uses Co-citation and Bibliographic Coupling as metrics, arranges papers in a Force Directed Graph for visual clustering, and is linked to the Semantic Scholar Paper Corpus.
- Travis is an AI-powered tutor to learn Data Science and ML for free. Travis’ index spans 150K+ free learning resources, along with interactive roadmaps.
Misc
- A great Regular Expressions (RegEx) reference.
- SQLBolt offers a series of interactive lessons and exercises designed to help you quickly learn SQL right in your browser.
- Metacademy is built around an interconnected web of concepts, each one annotated with a short description, a set of learning goals, a (very rough) time estimate, and pointers to learning resources.
- The concepts are arranged in a prerequisite graph, which is used to generate a learning plan for a concept.
Articles and Blog Posts
AI
- Karpathy explains deep learning in layman terms using a digit classification example.
- A survey of the current state of machine learning tooling by Chip Huyen, Stanford Lecturer and Snorkel AI.
- A synopsis of discussions with ~30 companies in different industries about their challenges with real-time machine learning by Chip Huyen, Stanford Lecturer and Snorkel AI.
- This post outlines the solutions for (1) online prediction and (2) continual learning, with step-by-step use cases, considerations, and technologies required for each level.
- Deploying a model isn’t the end of the process. A model’s performance can degrade over time in production. Once a model has been deployed, we have to continually monitor its performance to detect data distribution shifts as well as deploy updates to fix these issues. This post outlines (i) common causes of machine learning failures in production, (ii) the types of data distribution shifts, and (iii) monitoring and observability metrics and tools.
- A quick read by Analytics Vidhya on dealing with Imbalanced Classes in ML.
- On quantization, pruning, DeepSpeed, and knowledge distillation.
- A self-study guide by Google for aspiring machine learning practitioners. Machine Learning Crash Course features a series of lessons with video lectures, real-world case studies, and hands-on practice exercises.
- Machine learning operations (MLOps) explained.
- Understanding BigBird’s block sparse attention and how it can handle sequence lengths of up to 4096 through.
- CenterNet is an anchorless object detection architecture. As such, this structure has an important advantage in that it replaces the classical NMS (Non Maximum Suppression) step during post-processing. This mechanism enables faster inference. Read more about the CenterNet architecture in this post.
- ML systems can run to completion without throwing any errors (indicating functional correctness) but can still produce incorrect outputs (indicating behavioral issues). Thus, it is important to test the behavioral aspects of your model to make sure it works as you expected. Read more on Checklist, an NLP testing framework.
- This article from Jeff Dean, head of Google Research summarizes the areas in ML that are poised to go through exciting advances over the next several years.
- A tool that returns the summary of the latest SOTA research by Chip Huyen, Stanford Lecturer and Snorkel AI.
- A great overview of CNNs!
- A comprehensive system design analysis of microservices architecture at Netflix to power its global video streaming services.
- What happens inside the Microservices Architecture at Netflix when you click the Play button:
- Client sends a play request to the backend running on AWS. The request is handled by AWS elastic load balancer (ELB).
- AWS ELB will forward that request to API Gateway Service running on AWS EC2 instances. That component, named Zuul, is built by the Netflix team to allow dynamic routing, traffic monitoring and security, etc.
- Application API component is the core business logic, in this scenario, the forwarded request from API Gateway Service is handled by Play API.
- Play API will call a microservice or a sequence of microservices to fulfill the request.
- Microservices are mostly stateless small programs, to control its cascading failure and enable resilience. Each microservice is isolated from the caller processes by Hystrix.
- Microservices can save to or get data from a data store during its process.
- Microservices can send events for tracking user activities or other data to the stream processing pipeline for either real-time processing of personalized recommendations.
- The data coming out of the stream processing pipeline can be persistent to other data stores such as AWS S3, Hadoop HDFS, Cassandra, etc.
- How Netflix works, in layman terms.
- Discourse on deep autoregressive models which are sequence models, yet feed-forward (i.e., not recurrent); generative models, yet supervised. This post offers some observations on how autoregressive models are a compelling alternative to RNNs for sequential data, and GANs for generation tasks.
- Using AI super resolution models as the perfect tool to improve the graphics of classic games.
- A tutorial on setting up a PyTorch Dataset using
torch.utils.data.Dataset
and using PyTorch’s Dataloader class to fetch your data on multiple cores in real time and feed it right away to your deep learning model.
- Using distributed training on Azure ML using Horovod. Training process from Jupyter notebooks to Distributed ML. Concepts of data parallelism and model parallelism, centralized and de-centralized training, synchronous and asynchronous updates.
- This blog post from Lex Fridman provides an overview of deep learning in seven architectural paradigms with links to TensorFlow tutorials for each. It accompanies the Deep Learning Basics lecture as part of MIT course 6.S094.
- “How did AI get here? What’s its growth trajectory like?” by Mohamed El-Geish, Director of AI, Cisco.
- A brief timeline of NLP models from the traditional methods (BOW, TF-IDF, Word2Vec, etc.) to the Transformer family (BERT, GPT, RoBERTa, XLM, Reformer, ELECTRA, T5, etc.) from Fabio Chiusano.
- This article by Alexander Rush from Harvard presents an “annotated” version of the original Transformers paper in the form of a line-by-line implementation.
- In this article, Yann LeCun, Chief Scientist at Meta, proposes a modular, configurable architecture for autonomous intelligence that would allow machines to learn world models in a self-supervised fashion and then use those models to predict, reason, and plan.
- Video: A Path Towards Autonomous AI at Baidu by Yann LeCun, Chief AI Scientist at Meta
- Summary:
- Autonomous AI requires predictive world models.
- World models must be able to perform multimodal predictions.
- Solution: Joint Embedding Predictive Architecture (JEPA).
- JEPA makes prediction in representation space, and can choose to ignore irrelevant or hard-to-predict details.
- JEPA can be trained non-contrastively by (1) making the representations of inputs maximally informative, (2) making the representations predictable from each other, (3) regularizing latent variables necessary for prediction.
- JEPAs can be stacked to make long-term/long-range predictions in more abstract representation spaces.
- Hierarchical JEPAs can be used for hierarchical planning.
- Generative Flow Networks (GFlowNets) are a new research direction proposed by Yoshua Bengio. A GFlowNet is a trained stochastic policy or generative model, trained such that it samples objects $x$ through a sequence of constructive steps, with probability proportional to $R(x)$, where $R$ is some given non-negative integrable reward function. They live somewhere at the intersection of reinforcement learning, deep generative models and energy-based probabilistic modelling.
- Bengio’s technical talk at MILA; Discussion about GFlowNets, causality and consciousness providing a window on upcoming developments of GFlowNets aimed at bridging the gap between SOTA AI and human intelligence.
- First paper on GFlowNets published in NeurIPS 2021; blog entry.
- Answers to frequently asked ML questions from Sebastian Raschka, Assistant Professor of Statistics, University of Wisconsin-Madison.
- With the HuggingFace Encoder-Decoder class, you no longer need to stick to pre-built encoder-decoder models like BART or T5, but can instead build your own Encoder-Decoder architecture by doing a mix-and-match with the encoder and decoder model of your choice (similar to stacking legos!), say BERT-GPT2. This is called “warm-starting” encoder-decoder models.
- The algorithms and recommendation models StitchFix uses under-the-hood to carry out their style selection processes.
- A three-part series that gives an overview of the NVIDIA team’s first-place solution for the booking.com challenge. This first post gives an overview of recommender system concepts. The second post will discuss deep learning for recommender systems. The third post will discuss the winning solution, the steps involved, and also what made a difference in the outcome.
- This post goes over the motivation of building word embeddings, which are low-dimensional vector representations from a corpus of text, which preserve the contextual similarity of words. It also offers a walkthrough of generating word embeddings with word2vec.
)
- A very neat visual demo by Xin Rong that shows how word embeddings are trained.
- Accompanying talk
- This article from IBM goes over the idea of foundation models which are models trained on a relatively large amount of unlabeled data and fine-tuned/transfer-learned/adapted to the downstream application, thereby replacing task-specific models. Foundation models is a term first popularized by the Stanford Institute for Human-Centered Artificial Intelligence.
- This paradigm has a host of benefits including: (i) instead of requiring a large, well-labelled dataset for the specific task, foundation models need a great amount unlabeled data and only a limited set of unlabeled data to fine-tune it for different downstream tasks thereby reducing the labeled data requirements dramatically, (ii) since a foundation model can be shared for different downstream tasks, we can save on the resources needed to train task-specific models owing to the knowledge transfer that foundation models bring about (training a relatively large model with billions of parameters roughly has roughly the same carbon footprint as running five cars over their lifetime), and (iii) democratizing AI research by making it much easier for small businesses to deploy AI in a wider range of mission-critical situations owing to the reduced data labeling requirements.
- Tips from Sayak Paul, ML Engineer at Carted, to help navigate the process of building a foundation in ML to eventually be able to pick it up as a potential career option.
- Some of the top-level tips he mentions are: (i) find solace in uncertainty, (ii) overwhelm and courage to learn, and (ii) learn, apply, (demo), and repeat.
- Emil Wallner is an “internet-taught” ML Researcher working as a resident at Google. This article offers an interview that follows his hero’s journey, self-taught approach to education, experience with AI, and path to Google.
- This article by Jacob Gildenblat offers a fantastic overview of classical active learning, and then go over several papers that focus on active learning for deep learning.
- This Jupyter notebook by Jay Alammar offers a great intro to using a pre-trained BERT model to carry out sentiment classification using the Stanford Sentiment Treebank (SST2) dataset.
- A preliminary set of best practices applicable to any organization developing or deploying large language models by Cohere, OpenAI, and AI21 Labs.
- A set of monographs on the First Principles of Computer Vision by Shree Nayar, Professor, Computer Science, Columbia University covering Camera and Imaging, Features and Boundaries, 3D Reconstruction I, 3D Reconstruction II, and Visual Perception.
- Accompanying videos: 150 videos of First Principles of Computer Vision and YouTube link.
- These cheatsheets by Afshine Amidi and Shervine Amidi are easy-to-digest study guides highlighting the important points of some popular AI/ML courses at MIT/Stanford.
- This book by Christoph Molnar explores the concepts of interpretability in the context of ML and DL models. The book covers simple, interpretable models such as decision trees, decision rules and linear regression.
- The book also focuses on model-agnostic methods for interpreting black box models such as feature importance and accumulated local effects, and explaining individual predictions with SHapley Additive exPlanations (SHAP) values and Local Interpretable Model-agnostic Explanations (LIME).
- This Wikipedia page includes a list of publicly available datasets for image, text, sound, signal, physical, biological data, etc.
- For each dataset, along with it’s name, the following metadata is included: a brief description, preprocessing done, number of instances, format, default task, date of creation, paper reference, creator name.
- This post by Scuba Analytics argues that being data driven can harm organizations in some cases, but being data informed instead is the way to go.
- This post by Xavier Amatriain champions being data-driven as a value that needs to permeate through the organization and work regardless of who is in the room making decisions; it has to be part of the culture and easily accessible at everyone’s fingertips.
- This website offers a great summary of trends in prompt-based research.
- This article by Discover Magazine talks about the streetlight effect and offers examples from economics, drug discovery, and randomized controlled trials in medicine.
- This article by Dylan Patel and Afzal Ahmad reports on a leaked internal Google document that claims Open Source AI will outcompete Google and OpenAI.
- This article by Jerry Chen at Greylock partners explains why Systems of Intelligence are still the next defensible business model.
- This article by Dhruvil Karani discusses practical considerations while building a recommender system right from dataset creation to model training to offline evaluation to A/B testing.
- This article by Thomas Capelle offers an overview of using llama.cpp which uses the GGML library to run LLMs locally.
- Related: LM Studio and GPT4All.
- This article by Justine offers an overview of modifying llama.cpp to load weights using mmap() instead of C++ standard I/O. That enabled loading LLaMA 100x faster using half as much memory.
- This article by Javen Xu discusses how Lyft uses graph learning methods to generate embeddings.
- Their training algorithm extends from the Pytorch BigGraph package. The goal of training is to learn a \(d\)-dimensional embedding vectors for each entity (i.e. node) such that the calculated similarity between any two entities reflects their relationship. For example, two entities that are very close in the space would have a more positive similarity score than two distant entities.
- This article by Peter Gostev talks about injecting reasoning into recommendations using GPT-4.
- RTutor uses OpenAI’s powerful large language model to translate natural language into R code, which is then executed.
- OpenAI, Anthropic, and others are using Jigsaw’s Perspective – designed to moderate toxic human speech – to evaluate their large language models. What could go wrong?
- A neat scaled-down implementation of Llama (from scratch) by Brian Kitano. It’s well explained and broken down into the main components using PyTorch code.
- Notes from Google for maximizing the performance of deep learning models via tricks in the hyperparameter optimization (HPO) process. They also touch on other aspects of deep learning training, such as pipeline implementation and optimization, but our treatment of those aspects is not intended to be complete.
- Notes/lessons by HuggingFace on training IDEFICS, which is an 80 billion parameter model of DeepMind’s Flamingo VLM model. They highlight the mistakes they’ve made and remaining open questions. Using an auxiliary Z-loss, Atlas for data filtering, and BF16 loss values were particularly enlightening.
- Related: Older knowledge memo which focused on lessons learned from stabilizing training at medium scale.
- This article explores Amazon’s integration of artificial intelligence across its divisions, enhancing services like Alexa and Amazon Web Services through machine learning and deep learning advancements.
- Pearls of wisdom from Sam Altman, CEO of OpenAI.
Miscellaneous
- Advice from Karpathy on how one can traverse the PhD experience.
- Advice from Karpathy for younger students on how to do well in their undergrad/grad courses.
- Designing creative systems by Bret Victor, UI expert.
- Excel is powerful for data analysis on smaller datasets. This Excel Formulae Bible from eforexcel is great summary of some useful Excel formulae.
- Solutions to frequently asked backtracking questions on LeetCode.
- Dynamic programming patterns that can be found in different problems from LeetCode.
- Dynamic programming patterns that can be found in different problems from LeetCode.
- Basic strategies to identify dynamic programming patterns in interviews and solve them.
- Related: also, check this Quora post.
- The underlying problem-solving patterns for solving standard LeetCode problems.
- Infographics by Priyanka Vergadia that describe every product in the Google Cloud family in the visual sketchnote format to grasp the capability of the tools quickly and easily.
- In this article, Gergely Orosz from The Pragmatic Engineer talks about the best practices for shipping your code to production in a way that is fast and reliable by adopting the concepts of canary testing, feature flags and staged rollouts.
- Canary testing (also known as canary release or canary deployment) is a powerful way of testing new features and new functionality in production with minimal impact to users in a live production environment.
- This book by Afshine Amidi and Shervine Amidi is a concise and illustrated guide for anyone who wants to brush up on their data structures and algorithm fundamentals in the context of coding interviews or computer science classes.
- It is divided into 4 parts:
- Foundations: main types of algorithms and related mathematical concepts
- Data structures: arrays, strings, queues, stacks, hash tables, linked lists and associated theorems and tricks
- Graphs and trees: graph concepts and graph traversal algorithms along with important types of trees
- Sorting and search: common, efficient sorting and search algorithms
- RegEx is difficult to write and comprehend to the average human reader because of its complex patterns. This website uses GPT-3 to generate regular expressions from plain English.
- A list of new programming terms that have taken off in local circles (i.e. have heard others repeat it).
- AI as a pyramid of needs going from data collection at the bottom of the pyramid to ETL, analytics, and metrics in the middle to A/B testing and deep learning at the top.
- Null Hypothesis Statistical Testing (NHST) or more commonly known as statistical significance (stat sig.) has been the preferred to make decisions using experimentation. However, there are alternative approaches which are rapidly gaining adoption such as Expected Utility (EU). This article compares NHST with EU.
- An approachable presentation by Alex Komoroske, done in an emoji style, showing why even when everyone is competent and collaborative, you can still get hurricane-force coordination headwinds.