• Graph Neural Networks (GNNs) are a type of neural network architecture designed for processing graph-structured data. Graphs are a versatile data structure that can represent complex relationships, and GNNs extend traditional neural networks to operate directly on graphs.
  • GNNs can aggregate information from neighboring nodes, capture local and global graph structures, and make predictions based on graph-structured data.
  • They have been successfully applied to various applications, such as node classification, link prediction, and recommendation systems.
  • Popular GNN architectures include Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), GraphSAGE, and GGNNs.
  • GNNs are used in several industry applications such as Maps for Google/Uber,and Social Network for LinkedIn, Instagram, and Facebook.
  • GNNs are able to take advantage of both content information, such as user and item features, as well as the underlying graph structure, which may include user-item interactions or relationships with other entities, like sports.
  • In contrast, traditional models often rely on just one of these sources of information.
  • GNN implementation can easily be done by Pytorch Geometric or Deep Graph Library.
  • The image below (source) shows the different types of graphs but note, recommenders are usually bipartite graph.


  • Remember a graph is made up of sets of nodes or vertices that are connected by edges. GNN’s at a high level can be though of as an encoder-decoder architecture where the user and items are the nodes in the graph encoder and the edges represent the relationships or interactions between them.
  • In the encoder phase, the graph is fed into the GNN, which computes a representation or embedding for each node based on its local neighborhood and the overall graph structure. This embedding can capture both the content or features of the node as well as its context within the graph.
  • In the decoder phase, the GNN is used to make predictions or recommendations based on the learned node embeddings. This can involve computing similarity or distance measures between pairs of nodes, or using the embeddings as inputs to a downstream classifier or regression model.
  • One advantage of using GNNs for recommender systems is that they can incorporate both explicit and implicit feedback from users, as well as contextual information such as time or location. Additionally, GNNs can be used to model complex relationships between users and items, such as multi-modal interactions or hierarchical structures.
  • The image below (source) shows a high level overview of how GNN’s work for Recommendation.
  • In GNN-based recommender systems, generating embeddings refers to the process of representing the user and item nodes in the graph as low-dimensional feature vectors, or embeddings, that capture the relevant information about the node and its connections to other nodes in the graph and this is generally achieved through neural message passing.
  • Finally, the prediction of preferences is done via using cosine similarity.

Embedding Generation: Neural Message Passing

  • Generating embeddings in GNN is typically achieved through information propagation, also known as neural message passing, which involves passing information between neighboring nodes in the graph in a recursive manner, and updating the node representations based on the aggregated information.
  • The propagation process allows the embeddings to capture both local and global information about the nodes, and to incorporate the contextual information from their neighbors.
  • By generating informative and expressive embeddings, GNN-based recommenders can effectively capture the complex user-item interactions and item-item relations, and make accurate and relevant recommendations.
  • Neural message passing is a key technique for generating embeddings in GNN-based recommender systems. It allows the nodes in the graph to communicate with each other by passing messages along the edges, and updates their embeddings based on the aggregated information.
  • At a high level, the message passing process consists of two steps: message computation and message aggregation. In the message computation step, each node sends a message to its neighboring nodes, which is typically computed as a function of the node’s own embedding and the embeddings of its neighbors. The message function can be a simple linear transformation, or a more complex non-linear function such as a neural network. In the message aggregation step, each node collects the messages from its neighbors and aggregates them to obtain a new representation of itself. The aggregation function can also be a simple sum or mean, or a more complex function such as a max-pooling or attention mechanism.
  • The message passing process is usually performed recursively for a fixed number of iterations, allowing the nodes to exchange information with their neighbors and update their embeddings accordingly. The resulting embeddings capture the local and global information about the nodes, as well as the contextual information from their neighbors, which is useful for making accurate and relevant recommendations.
  • Some common algorithms and techniques used for neural message passing in GNNs are:
    • Graph Convolutional Networks (GCNs): GCNs apply a localized convolution operation to each node in the graph, taking into account the features of its neighboring nodes. This allows for the aggregation of information from neighboring nodes to update the node’s feature representation.
    • Graph Attention Networks (GATs): GATs use a learnable attention mechanism to weigh the importance of neighboring nodes when updating a node’s feature representation. This allows the model to selectively focus on the most relevant neighbors.
    • GraphSAGE: GraphSAGE uses a hierarchical sampling scheme to aggregate information from the neighborhood of each node. This allows for efficient computation of node embeddings for large graphs.
    • Message Passing Neural Networks (MPNNs): MPNNs use a general framework for message passing between nodes in a graph, allowing for flexibility in modeling different types of interactions.
  • In the context of graph neural networks (GNNs) for recommender systems, the goal is to generate embeddings for the user and item nodes in the graph. The embeddings can then be used for tasks such as candidate generation, scoring, and ranking.
  • The process of generating embeddings involves multiple GNN layers, each of which performs an exchange of information between immediate neighbors in the graph. At each layer, the information exchanged is aggregated and processed to generate new embeddings for each node. This process can be repeated for as many layers as desired, and the number of layers determines how far information is propagated in the graph.
  • For example, in a 2-layer GNN model, each node will receive information from its immediate neighbors (i.e., nodes connected by an edge) and its immediate neighbors’ neighbors. This allows information to be propagated beyond direct neighbors, potentially capturing higher-level structural relationships in the graph.
  • “Here is the pseudo-code, for a given node:
    • Fetch incoming messages from all neighbors
    • Reduce all those messages into 1 message by doing mean aggregation
    • Matrix multiplication of the neighborhood message with a learnable weight matrix
    • Matrix multiplication of the initial node message with a learnable weight matrix
    • Sum up the results of step 3 and 4
    • Pass the sum through a relu activation function
    • Repeat for as many layers as wished. The result is the output of the last layer.” (source)
  • The image below (source), visually represents this pseudo code.

  • Message passing has two steps, Aggregation and Update as we can see in the image below. (source)

  • “This aggregate function should be a permutation invariance function like sum or average
  • The update function itself can be a neural network (with attention or without attention mechanism) which will generate the updated node embeddings.” (source)


  • Now, let’s do a quick walkthrough on creating our own system from scratch and see all the steps that it would take.
  • First is the dataset, say we have user-item interaction, item features and user features available as shown below starting with user-item interaction. (source)
  • Below we see item features (source)
  • Below are the user features (source)

  • The next step is to create a graph as shown below (source)

  • The embeddings are created and the section above will cover it in greater detail.
  • The embeddings generated by GNNs are utilized to estimate the likelihood of a connection between two nodes. To calculate the probability of interaction between a user u and an item v, we use the cosine similarity function. After computing scores for all items that a user did not interact with, the system recommends the items with the highest scores.
  • The main goal during training of the model is to optimize the trainable matrices (W) used for generating the embeddings. To achieve this, a max-margin loss function is used, which involves negative sampling. The training data set only includes edges representing click and purchase events.
  • The model is trained in such a way that it learns to predict a higher score for a positive edge (an edge between a user and an item that they actually interacted with) compared to randomly sampled negative edges. These negative edges are connections between a user and random items that they did not interact with. The idea is to teach the model to distinguish between actual positive interactions and randomly generated negative interactions.


  • Seasonality in recommender systems refers to the variation in user preferences and item popularity over time. For example, certain items may be more popular during a particular season or event, such as Christmas or the summer months. This can cause issues for recommender systems that are not able to adapt to changing user preferences and item popularity over time.
  • To overcome seasonality in a music or movie app, one approach is to incorporate time-based features into the recommender system. For example, the recommender system can take into account the time of year or the day of the week when making recommendations. Additionally, the system can use algorithms that can adapt to changing user preferences over time, such as collaborative filtering with temporal dynamics or matrix factorization with time-dependent factors.
  • Another approach is to use contextual information to inform recommendations. For example, the recommender system can take into account a user’s location or weather conditions to make more relevant recommendations. This can be especially useful for music or movie apps where the content can be influenced by the user’s environment.

Use cases:


Uber Eats

Google Maps