## Overview

• This question is very similar to designing the system for top N trending topics.

## Design Goals/Requirements

• Functional requirements:
• Get the top N songs for a user over the past X days.
• Assume that the popularity of a song can be determined by the frequency of the song being listened to in the past.
• Non-functional requirements:
• Minimum latency: The suggestions should appear in real-time. The user should be able to see the suggestions within 200ms.
• High availability: The final design should be highly available.
• Eventual consistency (due to CAP theorem): As we know from the CAP theorem, that we can have either high availability or high consistency, thus we will aim for an eventually consistent system.
• Write-heavy service: Our system will be write heavy in nature as more users will be listening to songs rather than fetching the top-N songs that they have heard so far, thus the number of write requests will be far greater than read requests.

## Scale Estimation and Performance/Capacity Requirements

• Some back-of-the-envelope calculations based on average numbers.

### Traffic estimates

• Monthly Active Users (MAUs): 250M.
• Number of songs listened to daily: 1B.
• Read requests for the top N songs: 200.

## System APIs

• Once we have finalized the requirements, it’s always a good idea to define the system APIs. This should explicitly state what is expected from the system. These would be running as microservices on our application servers.
• We can have SOAP or REST APIs to expose the functionality of our service.
• The following could be the definition of the read API to fetch the top N songs over the past X days:

  getTopSongs(user_id, maximum_results_to_return, timestamp)

• Parameters:
• user_id (number): The ID of the user for whom the system will generate the top-N songs list.
• maximum_results_to_return (number): Number of posts to return.
• timestamp (number): The current timestamp to send places with timestamps associated greater than the current time – can also optionally send the ID (if the ID is encoded based on the timestamp) of the latest comment to fetch places greater than this ID).
• Returns: (JSON) Returns a JSON object containing a list of top-N songs.
• The following could be the definition of the write API to record when the user listens to a song:
addSong(user_id, song_id, timestamp)

• Parameters:
• user_id (number): The ID of the user for whom the system will generate the top-N songs list.
• song_id (number): ID of the song played.
• timestamp (number): The current timestamp to send places with timestamps associated greater than the current time – can also optionally send the ID (if the ID is encoded based on the timestamp) of the latest comment to fetch places greater than this ID).
• Returns: (Bool) Success

## High Level System Design

• Steps Involved:
1. Whenever the user listens to a song, we will log it on the application server.
2. A background async process will read the data from these logs, do some initial aggregation, and send the data for further processing.
3. The aggregated data will be sent to a distributed messaging system like Apache Kafka.
4. Then, we can divide our system into two paths, Fast Path or Slow Path.

### Fast Path

1. A data structure called Count-Min Sketch will be used.
2. It will calculate the top $N$ songs approximately, and the results will be available within seconds.

### Slow Path

1. A MapReduce pipeline will be run to calculate the top $N$ songs precisely.
2. This may take minutes or hours for the results to be made available.

### Choose the right approach

• Choose either path depending upon the system constraints that whether we will need near real-time results or if some delay is acceptable to achieve accuracy.

## Detailed Component Design

• It includes various components, such as:
• Fast path and slow patch
• Apache Kakfa for distributed messaging.
• Application servers to run system APIs as microservices.
• Suggestions ranking service.
• Caches for fast retrieval.
• Load balancers to distribute load as evenly as possible, and ensure crashed servers are taken out of the load distribution loop.