Overview

  • The cold start problem occurs when the recommendations, predictions, or other personalized services receives a new user and is unsure about what items to recommend to them since there is little or no historical data available on them. Similarly, it also exists when a new item is added and the system is unsure of which user to recommend that item to.
  • Netflix, for example, has the cold start problem occurring when a new user signs up and there is no information available about their movie preferences or viewing history. Similarly, when a new item is added to Netflix, there may be no data available on how users have interacted with it, making it difficult to provide relevant recommendations.
  • Some simpler ways to eliminate this problem is to ask users to give a response through a survey given at the time of joining, for example, monthly subscription box services often ask for input before curating boxes ideal for you. Below we will look at mitigation techniques and other details about the cold start problem.

Item Cold Start

  • When a new product is added to a web store or fresh content is uploaded to a media platform, it initially remains unknown with zero interactions or ratings, rendering it practically invisible to the recommendation system. This is referred to as product or item cold start, even though it may be relevant to some or many users.
  • Classified sites and news platforms are hit the hardest by this phenomenon. Fresh items on these platforms are typically in high demand, but their value deteriorates rapidly. Breaking news from yesterday becomes stale news today, and vintage bikes put up for sale last week may already be sold.
  • In addition, marketplaces also face item cold start issues when the same product is offered by different sellers under different product IDs. Since the personalization engine views these as distinct items, user interactions and ratings for one product won’t affect how its duplicates are recommended.
  • On e-retail sites, item cold start affects long-tail products, which are low-hype goods that sell only a few each month, but their sheer volume still generates significant traffic. As a result of low demand, it takes a long time for these products to accumulate enough user interactions to be recognized by the recommendation system, thus making them cold start items.

User Cold Start

  • When a website encounters new visitors without browsing history or known preferences, it can be difficult to create a personalized experience for them due to the absence of data typically used for generating recommendations. This is known as the user or visitor cold start problem.
  • Not only first-time users of a website, but even returning visitors can confuse recommendation systems if their behavior and preferences change from one session to the next.
  • Classified sites and video sharing platforms are often faced with this issue. For instance, a user may be interested in comparing and searching for hiking boots for a while, but once they make a purchase, they may switch to something completely unrelated, such as art supplies. In such cases, their browsing history won’t be helpful in predicting their next choice due to their session-based behavior.
  • In a broader context, some level of user cold start will always exist as long as online consumers continue to explore new topics and trends, and their lifestyle, circumstances, and needs continue to evolve.

Mitigation Techniques

Item-based collaborative filtering

  • Item based, or sometimes called item-to-item, collaborative filtering is an algorithm that identifies similar items based on the historical user-item interactions and recommends items that are similar to the ones a user has shown an interest in.
  • This algorithm uses similarity metrics such as cosine similarity or Pearson correlation to measure the similarity between items.
  • The Apache Mahout library provides an implementation of this algorithm.
  • Use cases:
    • Amazon uses item-based collaborative filtering to provide personalized recommendations to its users. When a user views a product, Amazon’s algorithm identifies similar products based on the historical user-item interactions and recommends those products to the user.
    • YouTube also uses item-based collaborative filtering to recommend videos to its users. The algorithm analyzes the user’s viewing history and identifies similar videos based on their metadata, such as title, description, and tags.

Content-based filtering

  • Content-based filtering is an algorithm that recommends items based on their features such as text, metadata, and tags.
  • This algorithm uses techniques such as term frequency-inverse document frequency (TF-IDF) to represent item features and cosine similarity to measure the similarity between items.
  • The scikit-learn library provides an implementation of this algorithm.
  • Use cases:
    • Spotify uses content-based filtering to recommend songs to its users. The algorithm analyzes the features of a user’s favorite songs, such as tempo, genre, and mood, and recommends other songs with similar features.
    • Netflix uses content-based filtering to recommend TV shows and movies to its users. The algorithm analyzes the metadata of each title, such as genre, cast, and plot summary, and recommends titles with similar metadata to the user.

Matrix factorization

  • Matrix factorization can help mitigate the cold start problem, where new items or users have little to no interaction history, by providing recommendations based on their similarities to existing items or users.
  • Matrix factorization typically involves factorizing the user-item interaction matrix into two lower-dimensional matrices: a user matrix and an item matrix. The user matrix represents the latent features of each user, and the item matrix represents the latent features of each item. The dot product of these two matrices gives an estimate of the user-item interaction matrix.
  • To mitigate the cold start problem for new items, matrix factorization can use the item features to learn their latent factors. Item features can include things like text, metadata, or tags, and can be represented as a vector. The item matrix can be initialized with the latent factors of the item features, allowing the model to make recommendations for new items based on their similarities to existing items with similar features.
  • To mitigate the cold start problem for new users, matrix factorization can use the user features to learn their latent factors. User features can include demographic information or explicit preferences. The user matrix can be initialized with the latent factors of the user features, allowing the model to make recommendations for new users based on their similarities to existing users with similar features.
  • Use cases:
    • Airbnb uses matrix factorization to recommend accommodations to its users. The algorithm factors the user-accommodation interaction matrix into two lower-dimensional matrices: a user matrix and an accommodation matrix. The user matrix represents the latent features of each user, such as preferred location and price range, and the accommodation matrix represents the latent features of each accommodation, such as location, amenities, and price. The dot product of these two matrices gives an estimate of the user-accommodation interaction matrix, which is used to make recommendations.
    • Pinterest uses matrix factorization to recommend pins to its users. The algorithm factors the user-pin interaction matrix into two lower-dimensional matrices: a user matrix and a pin matrix. The user matrix represents the latent features of each user, such as interests and preferences, and the pin matrix represents the latent features of each pin, such as category, description, and tags. The dot product of these two matrices gives an estimate of the user-pin interaction matrix, which is used to make recommendations.
  • The image below (source) shows matrix factorization and embeddings created for users and items.

Hybrid recommender system

  • Hybrid recommender systems combine two or more recommendation techniques to provide more accurate recommendations.
  • For instance, a hybrid system can combine content-based filtering and collaborative filtering. The content-based filtering is used when there is little or no user data, and collaborative filtering is used when there is sufficient user data.

  • The Apache Mahout library provides an implementation of this algorithm.
  • Use cases:
    • Goodreads uses a hybrid recommender system to recommend books to its users. The system combines content-based filtering and collaborative filtering. When a user signs up, the system asks for their favorite genres and authors, which are used as content-based features. The system also analyzes the user’s reading history and identifies similar users based on their historical reading patterns, which are used as collaborative features. The system combines these two sets of features to make recommendations to the user.
    • Yelp uses a hybrid recommender system to recommend businesses to its users. The system combines content-based filtering, collaborative filtering, and contextual recommendations. When a user searches for a restaurant, the system considers the user’s location, search history, and preferred cuisine as contextual features. The system also analyzes the user’s historical check-ins and reviews, as well as the check-ins and reviews of other users with similar preferences, as collaborative features. The system also analyzes the restaurant’s metadata, such as price, ratings, and cuisine, as content-based features. The system combines these three sets of features to make recommendations to the user.

Knowledge-based systems

  • Knowledge-based systems use domain knowledge to recommend items or content to users.
  • This approach is useful when there is little or no historical user data. For instance, in a music recommendation system, domain knowledge about genres, artists, and music features can be used to make recommendations.
  • Expert systems and rule-based systems are examples of knowledge-based systems.
  • Use cases:
    • IBM’s Watson, an AI system that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data.
    • Wolfram Alpha, a computational knowledge engine that provides answers to factual queries by computing answers from structured data.

Active learning

  • Active learning is a technique that involves selecting a set of representative items for users and asking them to provide feedback.
  • This feedback is then used to improve the accuracy of the recommendations.
  • This technique is suitable for cold start problems when there is insufficient historical data. Active learning can be implemented using various machine learning algorithms such as decision trees or neural networks.
  • Use cases:
    • Google’s Cloud AutoML, a suite of machine learning products that automate the process of training and deploying custom models for various use cases, such as image recognition, natural language processing, and translation.
    • Microsoft’s Azure Machine Learning, a cloud-based service that enables data scientists to create and deploy machine learning models

Context-aware systems

  • Context-aware systems consider contextual factors such as time, location, and user behavior to make recommendations.
  • This method can be useful when user data is scarce or the recommendation problem is complex.
  • For instance, in a restaurant recommendation system, contextual factors such as cuisine, price, and location can be considered to make personalized recommendations. Context-aware systems can be implemented using machine learning algorithms such as decision trees or rule-based systems.
  • Use cases:
    • Google Maps, a mapping service that uses contextual data such as traffic conditions and real-time updates to provide personalized directions to users.
    • Spotify’s Discover Weekly, a music recommendation system that analyzes a user’s listening history, preferences, and current context to create a personalized playlist.

Feature Hashing Trick

  • This technique is inspired from Damien Benveniste’s post (source), and Feature Hashing for Large Scale Multitask Learning, and it also helps in memory conservation as well as the cold start problem. The hashing trick can be used to encode user and item features for matrix factorization.
  • First, let’s see how it helps with memory conservation:
    • Say you were to design a recommender model to display ads to users. A naive implementation could include a recommender engine with user and Ads embeddings along with assigning a vector to each category seen during training, and an unknown vector for all categories not seen during training.
      • While this could work for NLP, for recommender systems where you can have hundreds of new users daily, the unknown category could exponentially increase!
    • Hashing trick is a way to handle this problem by assigning multiple users (or categories of sparse variables) to the same latent representation.
    • This is done by a hashing function by having a hash-sized hyperparameter you can control the dimension of the embedding matrix and the resulting degree of hashing collision.
  • The hashing trick is a method used in machine learning to reduce the dimensionality of feature vectors. It works by mapping each feature to a fixed-length vector of integers using a hash function.
  • The hashing trick can be used to address the cold start problem by allowing the system to handle new or unknown features without requiring retraining of the model.
  • Let’s take a deeper look at how it can help with the cold start problem via collaborative filtering:
    • As stated earlier, the hashing trick is a technique used in machine learning for dimensionality reduction and feature engineering, where high-dimensional input vectors are mapped to a lower-dimensional space using a hash function.
    • In collaborative filtering, the input data typically consists of a large number of high-dimensional sparse feature vectors that represent the user-item interactions. These feature vectors can be very large and computationally expensive to process, especially when dealing with large-scale datasets.
    • To address this issue, the hashing trick can be applied to map the high-dimensional feature vectors to a lower-dimensional space with a fixed number of dimensions. The resulting lower-dimensional feature vectors can then be used as inputs to the collaborative filtering algorithm, which can be more computationally efficient to process.
    • For example, in a user-item rating matrix, the high-dimensional feature vector for each user can include data such as their past ratings, demographic information, and behavioral data. By applying the hashing trick to these feature vectors, they can be mapped to a lower-dimensional space, reducing the computational complexity of the collaborative filtering algorithm.
    • One advantage of using the hashing trick in collaborative filtering is that it can preserve the sparsity of the input data, which is important for many collaborative filtering algorithms. Additionally, the resulting lower-dimensional feature vectors can be more efficiently processed and stored, improving the overall performance of the recommendation system.
  • The hashing trick is not limited to these two use-cases however, it can also help with personalization or in cases where the input data is high-dimensional and sparse, and where computational efficiency and storage constraints are a concern.
  • “But wait, are we not going to decrease predictive performance by conflating different user behaviors? In practice the effect is marginal. Keep in mind that a typical recommender engine will be able to ingest hundreds of sparse and dense variables, so the hashing collision happening in one variable will be different from another one, and the content-based information will allow for high levels of personalization. But there are ways to improve on this trick. For example, at Meta they suggested a method to learn hashing functions to group users with similar behaviors. They also proposed a way to use multiple embedding matrices to efficiently map users to unique vector representations (https://lnkd.in/diH4RSgH). This last one is somewhat reminiscent of the way a pair (token, index) is uniquely encoded in a Transformer by using the position embedding trick.
  • The hashing trick is heavily used in typical recommender system settings but not widely known outside that community!” (source)

References