- Design Goals/Requirements
- Scale Estimation and Performance/Capacity Requirements
- System APIs
- Data Model
- High Level System Design
- Detailed Component Design
- Further Reading
Let’s design a newsfeed for Facebook with the following requirements:
- Newsfeed will be generated based on the posts from the people, pages, and groups that a user follows.
- A user may have many friends and follow a large number of pages/groups.
- Feeds may contain images, videos, or just text.
- Our service should support appending new posts as they arrive to the newsfeed for all active users.
- Our system should be able to generate any user’s newsfeed in real-time - maximum latency seen by the end user would be 2s.
- A post shouldn’t take more than 5s to make it to a user’s feed assuming a new newsfeed request comes in.
Scale Estimation and Performance/Capacity Requirements
- Some back-of-the-envelope calculations based on average numbers.
- Let’s assume on average a user has 300 friends and follows 200 pages.
- Let’s assume 300M daily active users with each user fetching their timeline an average of five times a day. This will result in 1.5B newsfeed requests per day or approximately 17,500 requests per second.
- On average, let’s assume we need to have around 500 posts in every user’s feed that we want to keep in memory for a quick fetch. Let’s also assume that on average each post would be 1KB in size. This would mean that we need to store roughly 500KB of data per user. To store all this data for all the active users we would need 150TB of memory. If a server can hold 100GB we would need around 1500 machines to keep the top 500 posts in memory for all active users.
- Once we have finalized the requirements, it’s always a good idea to define the system APIs. This should explicitly state what is expected from the system. These would be running as microservices on our application servers.
- We can have SOAP or REST APIs to expose the functionality of our service. The following could be the definition of the API for getting the newsfeed:
getUserFeed(api_dev_key, user_id, since_id, count, max_id, exclude_replies)
api_dev_key (string):The API developer key of a registered can be used to, among other things, throttle users based on their allocated quota.
user_id (number):The ID of the user for whom the system will generate the newsfeed.
since_id (number):Optional; returns results with an ID higher than (that is, more recent than) the specified ID.
count (number):Optional; specifies the number of feed items to try and retrieve up to a maximum of 200 per distinct request.
max_id (number):Optional; returns results with an ID less than (that is, older than) or equal to the specified ID.
exclude_replies(boolean):Optional; this parameter will prevent replies from appearing in the returned timeline.
- Returns: (JSON) Returns a JSON object containing a list of feed items.
High Level System Design
Detailed Component Design