.net simplified |
Posted: 15 Oct 2021 01:11 AM PDT Hi Friends, Its been a while since my last post. Nevertheless, let’s get started again with new series of posts. In this section, we will begin new journey and see how to design system. Before designing any scalable system, what are the factors, we need to take care of? Let’s consider a scenario, where in we are designing hotstar like video ingestion system. How would we approach this part? What are the basic questions which we need take into account. Let’s look at these scenarios. Before, jumping to discussion, one point to note here that in system design problems, no solution is 100% correct and no solution is 100% wrong. These questions are open ended questions and answers can vary entirely on different scenarios presented. Hence, without wasting time, let’s get started. High Level Design:-
MVP (Minimal Viable Product) Requirements:-
Now, let’s go ahead and look at the scale estimation part. Scale Estimation:-
Now, let’s go ahead and consider and see QPS. QPS (Query Per Second):-Based on the problem statement, we have to support 10M parallel reads at least for the same resource. Before understanding this, let’s assume, we have only one instance which is serving 10 M parallel requests, then what will happen?
Now, before delving any further into QPS, let’s first have a high level look at most probable trade off for this. Trade Offs:-Low Latency:- Since, we are making video ingestion system and latency is coming into picture when Hotstar user is uploading video to server, hence we can ignore latency here as this is not something which end users are going to experience. Few videos, can take milliseconds to upload, few can go upto hour. Therefore, its fine to ignore latency part here. CAP:- Since, we are OK with latency in the system, hence based on CAP theorem, we are actually building Available system. Or we can further say that, this consistency can eventually become consistent over the period of time. Here, is one important point to note, our system is Available and Consistent both because we don’t have any partitions. Since, we have 1M videos at the moment and we are storing the location of video as metadata in our RDS (Relational database system), hence we can live without sharding hence, no partioning required. Say, one video location is taking 1kb space. Therefore, 1M*1Kb = 1Gb space for one year. which will grow like 1.2, 1.4 Gb YOY. Hence, based on above calculation one instance is more than enough to store this data. Therefore, our system is CA (Consistent and Available) till now. Single point of failure (SPOF):-When, we don’t have any partition in the system, then it will always have a threat of single point of failure. In order to get rid off SPOF, we can simply enable automated backups either snapshot model(async) or always update model (sync). Here, as soon as we will enable replication, we will be introducing multiple partitions. There are two ways of doing replication
Since, we need to make our system, always available, hence, we will go with first option, which also means now our system is AP (Available-Partition Tolerant). Multiple Failure Scenarios:-Let’s say process of ingestion involves multiple steps say s1 –> s2 –> s3 –> s4 –>s5. After completing these steps, it will be save in let’s say it will be available for public. Therefore, any of the steps from s1 to s5 can fail. On a daily basis 0/1 video can be uploaded on the platform, since this is not very frequent activity. But, let’s say any new series is getting released of 100 episodes, which needs to be uploaded as well. Assume, any developer has written script to upload them all in one shot. But, let’s say hotstar server accepts only 10 videos at time, then 90 odd videos upload will fail in this case. Therefore, in order to handle this scenario, all these files can be added in the Queue. Once server done processing other videos, it can pick the queued ones. Queue Processing:-
Hence, above processing like upload video to s3 –> update the video url in the database –> putting the job id in queue. These three steps are sequential steps.Therefore, for queue, we can use Redis Queue or Kafka here. Above explained processing is one way of deigning the system. There may be other good solutions to this problem like via workflow implementation. That will see in the next discussion. Till then stay tuned and Happy Coding. Thanks, 44 total views, 42 views today |
You are subscribed to email updates from My View. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
0 comments :
Post a Comment