Overview

  • A Newsfeed is the constantly updating list of stories in the middle of Facebook’s homepage. It includes status updates, photos, videos, links, app activity, and ‘likes’ from people, pages, and groups that a user follows on Facebook. In other words, it is a compiled scrollable version of your friends’ and your life story from photos, videos, locations, status updates, and other activities.

  • For any social media site you design - Twitter, Instagram, or Facebook - you will need some newsfeed system to display updates from friends and followers.

    • Similar Services: Twitter Newsfeed, Instagram Newsfeed, Quora Newsfeed

Design Goals/Requirements

  • Let’s design a newsfeed for Facebook with the following requirements:

  • Functional requirements:
    • Newsfeed will be generated based on the posts from the people, pages, and groups that a user follows.
    • A user may have many friends and follow a large number of pages/groups.
    • Feeds may contain images, videos, or just text.
    • Our service should support appending new posts as they arrive to the newsfeed for all active users.
  • Non-functional requirements:
    • Minimum latency: Our system should be able to generate any user’s newsfeed in real-time so the user does not experience any significant lag during newsfeed generation - maximum latency seen by the end user would be 2s.
    • High availability: The final design should be highly available because if Facebook’s news feed is down then it will be all over the news and we do not want that to happen since we are designing a highly available system.
    • Eventual consistency (due to CAP theorem): As we know from the CAP theorem, that we can have either high availability or high consistency, thus we will aim for an eventually consistent system.
    • Read-heavy service: Our system will be read heavy in nature as more users will be loading their news feed rather than posting statuses on Facebook, thus the number of read requests will be far greater than write requests.
    • New data absorption latency: A post shouldn’t take more than 5s to make it to a user’s feed assuming a new newsfeed request comes in.

Scale Estimation and Performance/Capacity Requirements

  • Some back-of-the-envelope calculations based on average numbers. using average numbers.

Traffic estimates

  • Daily Active Users (DAUs): 2B (by current Facebook estimates, as of April 2022)
  • Newsfeed requests: Each user fetching their timeline an average of ten times a day, leading to 10B newsfeed requests per day or approximately 116K requests per second.
  • Secondary estimates:
    • Each person posts twice a day
    • Each post is liked five times a day.
    • Each user has 300 friends and follows 200 pages.

Storage estimates

  • Number of posts in every user’s feed that we want to keep in memory for a quick fetch: 500
  • Size of each post: 1KB
  • Total storage needed per user: 500KB of data per user.
  • Total storage needed: 150TB for all the active users.
  • Number of servers needed (assuming each server has 100GB of memory): 1500 machines to keep the top 500 posts in memory for all active users.

System APIs

  • Once we have finalized the requirements, it’s always a good idea to define the system APIs. This should explicitly state what is expected from the system. These would be running as microservices on our application servers.
  • We can have SOAP or REST APIs to expose the functionality of our service. The following could be the definition of the read API for fetching the newsfeed:
getUserFeed(user_id, since_id, count, max_id, exclude_replies, api_dev_key)
  • Parameters:
    • user_id (number): The ID of the user for whom the system will generate the newsfeed.
    • since_id (number): Optional; returns results with an ID higher than (that is, more recent than) the specified ID.
    • count (number): Optional; specifies the number of feed items to try and retrieve up to a maximum of 200 per distinct request.
    • max_id (number): Optional; returns results with an ID less than (that is, older than) or equal to the specified ID.
    • exclude_replies(boolean): Optional; this parameter will prevent replies from appearing in the returned timeline.
    • api_dev_key (string): The API developer key of a registered can be used to, among other things, throttle users based on their allocated quota.
  • Returns (JSON): Returns a JSON object containing a list of feed items.

Data Model

  • There are three primary objects: User, Entity (e.g. page, group, etc.), and FeedItem (or Post). Here are some observations about the relationships between these entities:

    • A User can follow other entities and can become friends with other users.
    • Both users and entities can post FeedItems which can contain text, images, or videos.
    • Each FeedItem will have a UserID which will point to the User who created it. For simplicity, let’s assume that only users can create feed items, although, on Facebook Pages can post feed item too.
    • Each FeedItem can optionally have an EntityID pointing to the page or the group where that post was created.
  • If we are using a relational database, we would need to model two relations: User-Entity relation and FeedItem-Media relation. Since each user can be friends with many people and follow a lot of entities, we can store this relation in a separate table. The “Type” column in “UserFollow” identifies if the entity being followed is a User or Entity. Similarly, we can have a table for FeedMedia relation.

High Level System Design

  • At a high level this problem can be divided into two parts: (i) feed generation, and (ii) feed publishing.

Feed generation

  • Newsfeed is generated from the posts (or feed items) of users and entities (pages and groups) that a user follows. So, whenever our system receives a request to generate the feed for a user (say Jack), we will perform the following steps:

    1. Retrieve the IDs of all users, pages and groups that Jack follows.
    2. Retrieve the most recent posts (since last login) for those IDs. This is our starting set of posts that need to be filtered/down-sized. We shall deploy a multi-stage retrieval and ranking filtering process to funnel these NewsFeed posts.
    3. Filter them using a first-pass retrieval algorithm. These are the potential posts that we can show in Jack’s newsfeed.
    4. Rank these posts based on relevance using second-pass ranking algorithm to Jack and filter the previous set of posts even further. This represents Jack’s current feed.
    5. Store this feed in the cache and return top posts (say 20) will be returned and displayed/rendered on Jack’s feed.
    6. On the front-end UI, when Jack reaches the end of his current feed, the next top 20 posts will be fetched from the server to be displayed to him.

What about new incoming posts from people that Jack follows?

  • One thing to notice here is that so far, we have generated the feed only once and stored it in the cache. If Jack is online, we should have a mechanism to rank and add new posts from his friends and followed-pages to his feed.
  • We can periodically (say every five minutes) perform the above steps (say, as a cronjob) to rank and add the newer posts to his feed. Jack can then be notified (using say, a “load more posts” button) that there are newer items in his feed that he can fetch using a pull.

Feed publishing

  • Whenever Jack loads his newsfeed page, he has to request and pull feed items from the server. When he reaches the end of his current feed, he can pull more data from the server. For newer items either the server can notify Jack and then he can pull, or the server can push, these new posts. We will discuss these options in detail later.

  • At a high level, we will need following components in our Newsfeed service:

    1. Web servers: To maintain a connection with the user. This connection will be used to transfer data between the user and the server.
    2. Application server: To execute the workflows of storing new posts in the database servers. We will also need some application servers to retrieve and to push the newsfeed to the end user.
    3. Metadata database and cache: To store the metadata about Users, Pages, and Groups.
    4. Posts database and cache: To store metadata about posts and their contents.
    5. Video and photo storage, and cache: Blob storage, to store all the media included in the posts.
    6. Newsfeed generation service: To gather and rank all the relevant posts for a user to generate newsfeed and store in the cache. This service will also receive live updates and will add these newer feed items to any user’s timeline.
    7. Feed notification service: To notify the user that there are newer items available for their newsfeed.
  • Following is the high-level architecture diagram of our system. User B and C are following User A.

Detailed Component Design

  • Let’s discuss different components of our system in detail.

  • It includes various components, such as:
    • Newsfeed generation service.
    • Newsfeed ranking service.
    • Databases for storing user and post data.
    • Application servers to run system APIs as microservices.
    • Caches for fast retrieval.
    • Load balancers to distribute load as evenly as possible, and ensure crashed servers are taken out of the load distribution loop.
    • Push notification service.

Feed generation

  • Let’s take the simple case of the newsfeed generation service fetching most recent posts from all the users and entities that Jack follows; the query would look like this:
(SELECT FeedItemID FROM FeedItem WHERE UserID in (
    SELECT EntityOrFriendID FROM UserFollow WHERE UserID = <current_user_id> and type = 0(user))
)
UNION
(SELECT FeedItemID FROM FeedItem WHERE EntityID in (
    SELECT EntityOrFriendID FROM UserFollow WHERE UserID = <current_user_id> and type = 1(entity))
)
ORDER BY CreationDate DESC 
LIMIT 100
  • Here are issues with this design for the feed generation service:

    1. We generate the timeline when a user loads their page. This would be quite slow and have a high latency since we have to query multiple tables and perform sorting/merging/ranking on the results.
    2. Crazy slow for users with a lot of friends/followers as we have to perform sorting/merging/ranking of a huge number of posts.
    3. For live updates, each status update will result in feed updates for all followers. This could result in high backlogs in our Newsfeed Generation Service.
    4. For live updates, the server pushing (or notifying about) newer posts to users could lead to very heavy loads, especially for people or pages that have a lot of followers. To improve the efficiency, we can pre-generate the timeline and store it in a memory.

Caching offline generated newsfeeds

  • We can have dedicated servers that are continuously generating users’ newsfeed and storing them in memory for fast processing or in a UserNewsFeed table. So, whenever a user requests for the new posts for their feed, we can simply serve it from the pre-generated, stored location. Using this scheme, user’s newsfeed is not compiled on load, but rather on a regular basis and returned to users whenever they request for it.

  • Whenever these servers need to generate the feed for a user, they will first query to see what was the last time the feed was generated for that user. Then, new feed data would be generated from that time onwards. We can store this data in a hash table where the “key” would be UserID and “value” would be a STRUCT like this:

Struct {
    LinkedHashMap<FeedItemID, FeedItem> FeedItems;
    DateTime lastGenerated;
}
  • We can store FeedItemIDs in a data structure similar to Linked HashMap or TreeMap, which can allow us to not only jump to any feed item but also iterate through the map easily. Whenever users want to fetch more feed items, they can send the last FeedItemID they currently see in their newsfeed, we can then jump to that FeedItemID in our hash-map and return next batch/page of feed items from there.

How many feed items should we store in memory for a user’s feed?

  • Initially, we can decide to store 500 feed items per user, but this number can be adjusted later based on the usage pattern.
  • For example, if we assume that one page of a user’s feed has 20 posts and most of the users never browse more than ten pages of their feed, we can decide to store only 200 posts per user.
  • For any user who wants to see more posts (more than what is stored in memory), we can always query backend servers.

Should we generate (and keep in memory) newsfeeds for all users?

  • There will be a lot of users that don’t log-in frequently. Here are a few things we can do to handle this:
    • A more straightforward approach could be, to use an LRU based cache that can remove users from memory that haven’t accessed their newsfeed for a long time.
    • A smarter solution can be to run ML-based models to predict the login pattern of users to pre-generate their newsfeed, for e.g., at what time of the day a user is active and which days of the week does a user access their newsfeed? etc.
  • Let’s now discuss some solutions to our “live updates” problems in the following section.

Feed publishing (Fanout)

  • The process of pushing news feeds post to all the followers is called fanout. By analogy, the push approach is called fanout-on-write, while the pull approach is called fanout-on-load. Let’s discuss different options for publishing feed data to users.

    • (Client) “Pull” model or Fan-out-on-load:
      • This method involves keeping all the recent feed data in memory so that users can pull it from the server whenever they need it.
      • Clients can pull the feed data on a regular basis or manually whenever they need it.
      • Possible issues with this approach are:
        • Data can become stale unless a pull request is issued. In other words, new data might not be shown to the users until they issue a pull request.
        • It’s hard to find the right pull cadence, as most of the times, a pull requests will result in an empty response if there is no new data, causing a waste of resources.
    • (Server) “Push” model or Fan-out-on-write:
      • For a push system, once a user has published a post, we can immediately push this post to all the followers, thereby being more real-time than the pull model.
      • The advantage is that when fetching the feed, you don’t need to go through your friend’s list and get feeds for each of them. It significantly reduces read operations. To efficiently handle this, users have to maintain a Long Poll request with the server for receiving the updates.
      • A possible problem with this approach is the celebrity user issue: when a user has millions of followers (i.e., a celebrity-user), the server has to push updates to a lot of people.
    • Hybrid model (push+pull model):
      • An alternate method to handle feed data could be to use a hybrid approach, i.e., to do a combination of fan-out-on-write and fan-out-on-load.
      • Specifically, we can stop pushing posts from users with a high number of followers (a celebrity user) and only push data for those users who have a few hundred (or thousand) followers.
      • For celebrity users, we can let the followers pull the updates.
      • Since the push operation can be extremely costly for users who have a lot of friends or followers, by disabling fanout for them, we can save a huge number of resources. Another alternate approach could be that, once a user publishes a post, we can limit the fanout to only her online friends.
      • Thus, to get benefits from both the approaches, a combination of ‘push to notify’ and ‘pull for serving’ end-users is a great way to go. Purely a push or pull model is less versatile.
      • Another approach could be that the server pushes updates to all the users not more than a certain frequency and letting users with a lot of updates to pull data regularly.

How many feed items can we return to the client in each request?

  • We should have a maximum limit for the number of items a user can fetch in one request (say 20). But, we should let the client specify how many feed items they want with each request as the user may like to fetch a different number of posts depending on the device (mobile vs. desktop).

Should we always notify users if there are new posts available for their newsfeed?

  • It could be useful for users to get notified whenever new data is available. However, on mobile devices, where data usage is relatively expensive, it can consume unnecessary bandwidth. Hence, at least for mobile devices, we can choose not to push data, instead, let users pull-to-refresh to get new posts.

Feed Ranking

  • The most straightforward way to rank posts in a newsfeed is by the creation time of the posts (thus, based on just recency), but today’s ranking algorithms are doing a lot more than that to ensure “important” posts are ranked higher. The high-level idea of ranking is first to select key “signals or features” that make a post important and then to find out how to combine them to calculate a final ranking score.
  • More specifically, we can select features that are relevant to the importance of any feed item, for e.g., number of likes, comments, shares, time of the update, whether the post has images/videos, etc., and then, a score can be calculated using these features. This is generally enough for a simple ranking system.
  • A better ranking system can significantly improve itself by constantly evaluating if we are making progress in user stickiness, retention, ads revenue, etc.

Data Partitioning

Sharding posts and metadata

  • Since we have a huge number of new posts every day and our read load is extremely high too, we need to distribute our data onto multiple machines such that we can read/write it efficiently. For sharding our databases that are storing posts and their metadata, we can have a similar design as discussed under Designing Twitter, as follows:

Sharding feed data (based on User ID)

  • For feed data, which is being stored in memory, we can partition it based on UserID. We can try storing all the data of a user on one server. When storing, we can pass the UserID to our hash function that will map the user to a cache server where we will store the user’s feed objects.
  • Also, for any given user, since we don’t expect to store more than 500 FeedItemIDs, we will not run into a scenario where feed data for a user doesn’t fit on a single server. To get the feed of a user, we would always have to query only one server (thereby ). For future growth and replication, we must use Consistent Hashing.

Load Balancing

  • We can add a load balancing layer at two places in our system between:
    • Clients and Web servers, and,
    • Application servers and Backend servers.
  • Initially, a simple Round Robin approach can be adopted; that distributes incoming requests equally among backend servers. This LB is simple to implement and does not introduce any overhead. Another benefit of this approach is LB will take dead servers out of the rotation and will stop sending any traffic to it.
  • A problem with Round Robin LB is it won’t take server load into consideration. If a server is overloaded or slow or has crashed, the LB will not stop sending new requests to that server. To handle this, a more intelligent LB solution can be placed that periodically polls/queries the backend server about their load and adjust traffic based on that.

Caching

  • To deal with hot posts we can introduce a cache in front of each of our databases (metadata and posts DB). Assume 20% of hot data. We can use Memcached, which can store all such hot posts in memory.
  • Application servers, before hitting the backend databases, can quickly check if the cache has that post. Based on clients’ usage patterns, we can adjust how many cache servers we need. For cache eviction policy, Least Recently Used (LRU) seems suitable for our system.

RAID for Object Storage caching

  • We can utilize RAID for caching block/object storage.

Further Reading