Design Instagram

adminMay 17, 2020

0 583 1 minute read

Designing Instagram is one of the most frequent question asked in the interviews. Lets discuss the design approach and various choices.

Requirements:

User can post an image
One user can follow the other user
Generate the home feed for the user

Design Tenets

GraphQL solves the problem of underfetching and overfetching of the data. It also reduces the number of API calls by returning the required data in signal backend calls. The GraphQL graph schema is also suitable for exposing the API for the social network, so we would be using GraphQL for the API.
The images would be stored in the object store similar to S3.
Authentication and authorization edge functions would be implemented as middleware with the graphQL service
We would use graphQL subscribe type to send the real time updates to the client
The CQRS pattern would be used for microservice architecture. The read service would read the data from the graph database’s replicas. Read service would implement feed API while the command service would implement follow and upload API.

Architecture

GraphQL API

// Instragram Post

Type Post { Id: ID ! Name: String ! Url: String! Description: String }

// User information
Type User { Name: String ! Followers: [User !]// List of posts created by this user Posts: [Post]// Feed for the user Feed: String }

Type Mutation { doPost( Name: String !, Url: string !, Description: string ) }

Type Query { allUsers(user:id): User! }

Type Subscription { newPost: Post! }

Schema = { Query: Query, Mutation: Mutation, Subscription: Subscription }

Datastore

We would be using the Neo4j database for storing the posts. The database schema would be as following:

Nodes :

:User { UserName: String }

:Post { Name: String Url : String Description: String }

Relationships:

[has_posts], [follows]

Neo4J would be deployed as a cluster with the core servers and the read replicas.

Observability

We need to collect metrics, log and use distributed tracing for the application. For collecting metrics, prometheus can be used. For logs, fluentd can be used which can use elastic search as the log sink . Fluentd is a recommended solution for log aggregation on docker. For distributed tracing, the common solutions are Jaeger or Zipkins. The distributed tracers, propagates a distributed context and which correlates all the logs for a request.

Circuit breakers and bulkhead are the command patterns for handling the service failures. Rate limiters would be middleware on the servers.

References:

Graphqlbin playground http://snowtooth.moonhighway.com/
https://neo4j.com/docs/operations-manual/current/clustering/introduction/

adminMay 17, 2020

0 583 1 minute read

Design Tenets

Architecture

GraphQL API

Datastore

Observability

References:

Designing a parking lot

Related Articles

Leave a Reply Cancel reply