• Apple loves to surprise and delight their customers
  • I’m also always in awe of how user friendly and simplistic the UI is, this is something we struggle with quite a lot
  • Kool Aid:
    • Focus Mode: listening history is disabled, kids can listen, this is great because when I focus, I have a song on repeat and then in my end of year wrapped, that song/artist/recommendation is my #1
    • Apple Music Discovery Station: new, not listened to prior music
    • Apple Music Sing: lyrics to sing along adjustable vocals take the lead, background vocals vs main vocals

3 core systems

Here’s a comparison of the three recommendation approaches:

  1. Item-to-Item Collaborative Filtering:
    • Focuses on finding similarities between items based on user interaction patterns.
    • Recommends items similar to what the user has liked or interacted with, regardless of other users’ preferences.
  2. User-to-User Collaborative Filtering:
    • Identifies similarities between users based on their interaction history.
    • Recommends items liked by similar users. If user A has similar tastes to user B, items liked by user B are recommended to user A.
  3. Content-Based Filtering:
    • Relies on item attributes (like genre, author, features) for recommendations.
    • Recommends items similar in content to those the user has liked, independent of other users’ interactions.
  • Item-to-item focuses on relationships between items, user-to-user on relationships between users, and content-based on the properties of the items themselves.
  • Using item-to-item collaborative filtering for recommending two different types of items, such as a shirt and a book, can be effective, but it depends on the context and the nature of the user data available. If there’s sufficient cross-category user interaction data (e.g., users who buy a certain type of shirt also tend to buy certain types of books), then item-to-item collaborative filtering can identify and leverage these patterns to make cross-category recommendations. However, if the user interaction data shows little correlation between these categories, a more sophisticated approach or a hybrid recommendation system might be necessary to effectively recommend such diverse items.

Main Recommender

In a music streaming service like Amazon Music or Spotify, the main recommender system, cold start user solution, and cold start item strategy work in conjunction to provide a comprehensive and personalized user experience:

  1. Main Playlist Recommendations: This system continuously analyzes a user’s interactions, preferences, and listening history to curate personalized playlists. It forms the core of user experience for regular users.

  2. Cold Start User Solution: For new users without interaction history, the system employs strategies like using demographic data or asking for initial genre/artist preferences. These initial recommendations serve as a starting point, and as users interact more, their data is fed into the main recommender system for more personalized experiences.

  3. Cold Start Item Strategy: When introducing new songs or albums, these items are initially recommended to users based on general popularity, similarity to existing popular items, or artist recognition. As users begin to interact with these new items, the system gathers data to further refine its recommendations in the main playlist system.

In conjunction, these systems ensure both new and existing users receive tailored recommendations, while also effectively integrating new content into the platform. The overall goal is to enhance user engagement and satisfaction, regardless of how familiar they are with the service or how new the content is.


Implementing the second iteration of a Neural Collaborative Filtering (NCF) based music recommendation system on AWS Cloud with Metaflow involves setting up a robust, scalable infrastructure that leverages various AWS services and the workflow management capabilities of Metaflow. Here’s an overview of what this infrastructure might look like:

Core Components

  1. AWS Services:
    • Amazon S3: For storing raw data (user interactions, song metadata, etc.), processed data, and model artifacts.
    • Amazon EC2 or ECS: For running data processing and machine learning tasks. EC2 instances can be used for custom setups, while ECS (Elastic Container Service) allows for containerized applications.
    • AWS Lambda: For handling real-time recommendation requests and other serverless computations.
    • Amazon RDS/DynamoDB: For storing structured data like user profiles, song details, etc. RDS for relational data and DynamoDB for NoSQL requirements.
    • Amazon Redshift or Athena: For big data analytics, querying large datasets, and performing complex ETL jobs.
    • Amazon SageMaker: For building, training, and deploying machine learning models at scale.
    • AWS Glue: For ETL processes and data cataloging.
    • Amazon Personalize: Optionally, for leveraging an AWS-managed service specifically designed for creating personalized recommendations.
  2. Metaflow:
    • Workflow Management: Metaflow is used to orchestrate and manage data science workflows, ensuring smooth transitions between different stages of data processing, model training, evaluation, and deployment.
    • Versioning and Experiment Tracking: Metaflow also helps in tracking different experiments, managing data versions, and keeping a record of various model iterations.

Workflow Overview

  1. Data Ingestion and Storage:
    • Data from various sources (user interactions, song databases, etc.) is ingested and stored in Amazon S3.
  2. Data Processing and ETL:
    • AWS Glue is used for ETL jobs to prepare the data for training and inference. This includes cleaning, normalization, feature extraction, etc.
    • Metaflow orchestrates these ETL workflows, ensuring efficient data processing.
  3. Model Training and Evaluation:
    • Amazon SageMaker is used for training the NCF and other machine learning models. It provides an environment for training at scale, hyperparameter tuning, and model evaluation.
    • Metaflow manages the model training workflows, including experimentation and versioning.
  4. Model Deployment:
    • Deploy the trained models using Amazon SageMaker for batch processing or real-time inference.
    • For real-time recommendation scenarios, AWS Lambda can be used to serve recommendations, invoking SageMaker endpoints.
  5. Monitoring and Scaling:
    • Continuous monitoring of the system’s performance is crucial. AWS CloudWatch can be used for monitoring and logging.
    • The infrastructure should be designed to scale up or down based on demand, leveraging the elasticity of AWS services.
  6. User Interaction and Feedback Loop:
    • User interactions with the recommended items are captured and fed back into the system for continuous learning and model refinement.

Real-Time Recommendations

Switching to real-time recommendations in the context of a Next Best Action (NBA) system, like the one we’ve discussed for a music streaming service, would involve significant changes in both the infrastructure setup and the operational workflow. In a real-time system, while each user’s data is processed individually, it doesn’t necessarily mean that each user goes through Logistic Regression (LR) one by one in a sequential manner. Let’s explore what changes are needed:

Infrastructure Changes for Real-Time Processing

  1. High-Performance Computing Resources:
    • Utilize AWS services like EC2 or Lambda with auto-scaling capabilities to handle varying loads and ensure low-latency responses.
  2. Real-Time Data Streaming and Processing:
    • Implement services like Amazon Kinesis for real-time data streaming. This allows the system to process user actions (like clicks, listens, skips) as they happen.
  3. In-Memory Databases:
    • Use fast, responsive databases such as Amazon ElastiCache or DynamoDB to quickly access and update user profiles and item metadata.
  4. API and Endpoint Management:
    • Set up AWS API Gateway or similar services to manage API requests for real-time recommendation queries.
  5. Load Balancing:
    • Implement load balancing to efficiently distribute incoming user requests across multiple instances or endpoints.

Operational Workflow for Real-Time Recommendations

  1. User Interaction Triggers:
    • User actions (like starting a session, skipping a song, etc.) trigger the recommendation process in real-time.
  2. Feature Generation:
    • As soon as a trigger is received, the system rapidly generates feature vectors for the user based on the latest interaction and historical data.
  3. Model Inference:
    • The feature vector is immediately passed to the LR model (or any other suitable model). This doesn’t happen sequentially for each user but in parallel across multiple instances/servers.
    • The model quickly computes a probability score for a set of potential recommendations.
  4. Threshold and Decision Logic:
    • Based on predefined thresholds and decision logic, the system determines which items (songs, playlists) to recommend.
  5. Response Generation:
    • The recommendations are then sent back to the user’s device, ideally in a matter of milliseconds.

Key Considerations

  • Model Complexity vs. Latency: The model needs to be efficient enough to make predictions quickly while still being complex enough to provide accurate recommendations.
  • Scalability: The system must scale up to handle peak loads, ensuring consistent performance even during high-traffic periods.
  • Concurrency: The infrastructure must support concurrent processing of multiple user requests. Users don’t go through the model one by one; instead, the system handles many users in parallel.
  • Continuous Learning: Ideally, the system should continuously learn from new data to keep the recommendations relevant and up-to-date.


  1. Compute Resources:
    • High-performance, low-latency compute resources are essential. AWS Lambda or EC2 instances with auto-scaling capabilities are often used.
    • Amazon SageMaker can be used for hosting models and managing real-time inference endpoints.
  2. Data Streaming:
    • Services like Amazon Kinesis or Apache Kafka are used for real-time data streaming and processing.
    • Real-time user interactions (clicks, listens, skips) are processed as they occur.
  3. Database:
    • Fast, responsive databases like Amazon DynamoDB or Redis are used for quick access to user profiles and item metadata.
  4. API Gateway:
    • AWS API Gateway or similar services manage API requests and responses for recommendation queries.
  5. Load Balancing:
    • Implement load balancing to efficiently distribute incoming user requests across multiple instances or endpoints.


  6. Model Complexity:
    • Models need to be optimized for low-latency inference. This often means compromising on model complexity.
    • Lightweight models or distilled versions of complex models are used.
  7. Real-Time Learning:
    • Models may incorporate real-time learning or quickly adapt to recent user interactions.
  8. Scaling:
    • Models and infrastructure need to handle varying loads, scaling up during peak times and down during low activity periods.

Batch Recommendations


  1. Compute Resources:
    • Batch processing can be done on a scheduled basis using Amazon EC2 instances or AWS Batch.
    • Less emphasis on low-latency, more on throughput and cost-efficiency.
  2. Data Handling:
    • Batch jobs typically process data stored in Amazon S3 or databases like Amazon Redshift.
    • ETL jobs via AWS Glue to prepare data for batch processing.
  3. Scheduling:
    • AWS Step Functions or similar services to schedule and orchestrate batch recommendation jobs.


  1. Model Complexity:
    • Can use more complex models as latency is less of a concern.
    • More comprehensive use of user history and item interactions, as there’s more time to process data.
  2. Data Scope:
    • Batch models can consider a wider range of data, including long-term user preferences and broader interaction histories.
  3. Frequency of Updates:
    • The models are retrained and updated at regular intervals, not necessarily in real-time. Frequency can be daily, weekly, etc., depending on the application.


  • Latency: Real-time systems prioritize low response times, while batch systems prioritize processing efficiency over immediacy.
  • Complexity and Depth: Batch processing allows for deeper analysis using more complex models; real-time systems require a more streamlined approach.
  • Cost: Real-time systems may incur higher costs due to the need for responsive, always-on resources. Batch systems can be more cost-effective as they utilize resources less continuously.
  • Use Cases: Real-time recommendations are crucial for dynamic, user-interactive environments, while batch recommendations are suitable for less time-sensitive scenarios, like daily personalized playlist generation.

In conclusion, the choice between real-time and batch recommendations depends on specific application needs, user experience goals, and operational constraints. Each approach requires a distinct setup in terms of infrastructure and modeling, tailored to its unique demands.


  • List out a few projects I have done and their implementation + Infra details

Next Best Action

  • Recommend music, podcasts, artist, songs, videos, audible to users based on their past preferences by sending push notification in app and email messages.
  • Hypothesis: If we educate our users on how to use our app by recommending different activities on it, we would have higher engagement and be able to convert inactive to active users
  • Baseline model:
    • Collaborative Filtering (candidate generation) + Logistic Regression (to predict whether a user will click on this recommendation or not)
    • Data: User-interaction of each song, duration, skips, time, day, age, gender, demographic, genre, artist, album
    • LR was conducting a binary classification task, will the user click this or not, binary cross entropy loss
    • Input: an action for a user, Output: a probability of click
    • Batch processing

Offline Metrics

Offline metrics are used during the model development and testing phases before the model is deployed in a live environment. These metrics are based on historical data:

  1. Accuracy: Measures the proportion of total predictions that were correct.

  2. Precision: Assesses how many of the items recommended by the model were actually relevant to the users.

  3. Recall: Determines how many of the relevant items were actually recommended by the model.

  4. F1 Score: Combines precision and recall into a single metric, balancing the trade-off between them.

  5. AUC-ROC (Area Under the Receiver Operating Characteristic Curve): Evaluates the model’s ability to distinguish between classes (clicked vs. not clicked).

  6. Confusion Matrix: Provides a detailed breakdown of correct and incorrect predictions, showing true positives, false positives, true negatives, and false negatives.

  7. Mean Squared Error (MSE) or Root Mean Squared Error (RMSE): For regression-based approaches, these metrics measure the average squared difference between the estimated values and the actual value.

Online Metrics

Online metrics are used to evaluate the model’s performance in a live environment with real users. These metrics are crucial for understanding user satisfaction and engagement:

  1. Click-Through Rate (CTR): Measures the ratio of users who click on a recommended item to the total number of recommendations displayed.

  2. Conversion Rate: Assesses the percentage of clicks that resulted in a desired action (like listening to a song for a certain amount of time).

  3. User Engagement Metrics: Tracks various aspects of user engagement, such as average listening time, number of sessions, and session length.

  4. Retention Rate: Measures how well the recommendation system keeps users returning to the platform.

  5. Churn Rate: Determines the rate at which users stop using the service.

  6. Novelty and Diversity: Assesses how varied and new the recommendations are to individual users.

  7. User Satisfaction Surveys: Direct feedback from users regarding their experience with the recommendations.

  • Iteration 2 of NBA:
    • Neural Collaborative Filtering + Retrieval + Ranking + Finer Ranking for personalization
    • The second iteration of a music recommendation system for something like Amazon Music, incorporating Neural Collaborative Filtering (NCF), and layered stages of Retrieval, Ranking, and Finer Ranking for personalization, represents a more sophisticated approach than a basic logistic regression model. Let’s break down how this iteration might work:
      1. Neural Collaborative Filtering (NCF)
      • Basics: NCF combines traditional collaborative filtering (CF) techniques with the power of neural networks to capture the complex user-item interactions.
      • Implementation:
        • User and Item Embeddings: First, it generates embeddings (dense vector representations) for users and items (songs, playlists). These embeddings are learned from user interaction data (likes, plays, skips).
        • Neural Network Architecture: The embeddings are then fed into a neural network that learns to predict user-item interactions (e.g., whether a user will like a song).
      • Advantages: NCF can capture non-linear and complex patterns in the data, which traditional matrix factorization methods in CF might miss.
    1. Retrieval
      • Purpose: This stage narrows down the vast music library to a smaller, manageable set of items likely to be relevant to the user.
      • How It Works: The NCF model outputs a score for potential user-item pairs, and items with scores above a certain threshold are selected as candidates.
      • Efficiency: This step ensures that the subsequent, more computationally intensive ranking stages only focus on items with a reasonable chance of being relevant.
    2. Ranking
      • Objective: To sort the retrieved items in order of relevance to the user.
      • Method: A more sophisticated model, possibly another neural network, ranks the items based on the likelihood of user interaction. This model can consider additional features like recent user behavior, contextual information, and item metadata.
      • Result: The output is a prioritized list of recommendations tailored to the user’s inferred preferences.
    3. Finer Ranking for Personalization
      • Further Personalization: This stage refines the ranking even more to account for subtle user preferences and contextual nuances.
      • Techniques: Advanced methods like deep learning or even reinforcement learning models can be used to continuously adapt and personalize the recommendations based on real-time user feedback and interactions.
      • Dynamic Adaptation: This step allows the system to adjust recommendations in real-time, ensuring they stay relevant and engaging.
  • How It All Comes Together
    1. NCF Model: Starts with a broad set of potential recommendations based on learned user-item interactions.
    2. Retrieval Stage: Filters this set down to a more manageable size, focusing on items with a higher likelihood of relevance.
    3. Ranking Stage: Further prioritizes this set, ordering items by relevance using additional features and contextual data.
    4. Finer Ranking: Adds an additional layer of personalization, fine-tuning recommendations to align closely with the user’s current preferences and context.

LLM + Recommender System - Hard samples for latency constraints

  1. Data Collection and Preparation
    • Gather Labeled Data: Assemble a dataset of user queries related to music requests, labeled with the corresponding API calls (e.g., playMusic(track=" ", artist=" ")).
    • Data Augmentation: Enhance the dataset with variations in phrasing, including common misspellings and different ways users might request music.
  2. Preprocessing
    • Tokenization: Break down queries into tokens for processing.
    • Contextualization: Consider the user’s previous interactions or current settings (like time of day) to better understand the query.
    • Named Entity Recognition (NER): Extract specific entities such as song titles, artist names, or genres from queries.
  3. Fine-Tuning the LLM
    • Initialize with Pretrained LLM: Start with a foundational model like GPT-3 or GPT-4.
    • Task-Specific Layer: Add a classification layer tailored to categorize different types of music-related API functions.
    • Training: Adjust the LLM using the prepared music dataset. Fine-tune model settings as required.
  4. Model Evaluation
    • Validation Set: Regularly assess model performance during training with a separate validation dataset.
    • Testing: Post-training, evaluate the model’s accuracy in interpreting music-related queries using a distinct test dataset.
  5. Deployment
    • Integration: Incorporate the fine-tuned model into the Alexa Echo system, enabling it to process user music requests and execute appropriate API calls.
    • Monitoring: Continuously monitor the model’s performance in real-world scenarios.
  6. Feedback Loop and Continuous Improvement
    • Collect Real-World Feedback: Gather user responses and system performance data.
    • Model Updating: Periodically retrain the model with new data to maintain and improve accuracy.
  7. Validation and Error Handling
    • Schema Validation: Ensure API calls are structured correctly.
    • Confidence Scoring: Evaluate the model’s confidence in its predictions and handle uncertain cases appropriately.

Considerations for Building the Music Recommendation and Playback System

  1. Understanding the User’s Intent and Context
    • Entity Recognition: Accurately identify song names, artists, genres, etc.
    • Intent Classification: Differentiate between general and specific music requests.
    • Contextual Factors: Take into account the user’s location, preferences, and any special conditions like holidays.
  2. Query Processing and API Call Generation
    • Intent Recognition: Determine the primary action, e.g., “play music.”
    • Slot Filling: Extract key details from the query.
    • Argument Construction: Formulate these details into function arguments.
    • API Call Mapping: Match user intents and details with the correct API function.
    • Incomplete Query Handling: Prompt the user for more information if necessary.
    • Execution: Trigger the appropriate function for music playback or other actions.

Key Points

  • Resource Management: Plan for the computational and storage resources required for training and operation.
  • Ethical and Privacy Considerations: Ensure user data privacy and address potential biases in the model.
  • Version Control: Maintain records of model versions for rollback if needed.

Item 2 Item