Design Twitter
Requirements
- A user can follow other users
- User should be able to see the feeds on the homepage
Architecture
Graph databases can be adequately used to dictate the relationship between users. Formally, a graph is a collection of vertices and edges. Mysql, Postgres or any other relational databases are an option, but this type of models, fits well with the graph database and very intuitive.
Twitter data can be easily represented as a graph. FlockDB graph database was built by Twitter to store the social graph. Neo4j would be the data store for the twitter data.
Node types : 1. User Node 2. Tweet Node
Pull Vs Push
A user’s home timeline is created by the tweets from the users, that he is following. For example, we have two users A and B and A is following B. When user B does a tweet, we have two options – Add the tweet immediately in the A’s time line (while writing the tweet in the database) or it is fetched when we build the timeline for A. First approach increases the write cost while the other increase the cost at the read time.
A hybrid approach suites well to build the timeline. If the user doesn’t have large number of followers, the tweet can be added in the user which are following him. For the influencers, the tweet can be fetched while creating the time line which alleviate the high write amplification costs.
APIs
- POST /create
Body: tweet [String]
- POST /follow
Body: UserIDs [List: String]
- GET /timeline
Database and Schema
Neo4J schema:
Relationship:
[:follows] [:tweeted]