Capacity Planning
Envelope calculation is a technique used to check and validate a design idea. The two numbers we want to arrive at are queries per second (QPS) and storage. It is important to note that absolute accuracy is not important and the goal with envelope calculation is to get within the order of magnitude. Queries per second (QPS) is the number of requests we can expect per second. This can help inform the number of servers needed to support the QPS. In general, QPS can be broken down into daily active users (DAU), DAU\_subset, content\_rate, peak\_scale\_factor, and seconds per day. DAU is usually easy to obtain using historical traffic data. DAU\_partition estimates the subset of DAU that this feature is designed for. For example, only 20% of Twitter’s DAU actually tweets. The content\_rate estimates the average amount of content produced or consumed per DAU. Using the given example, there may be 2 tweets per DAU. The peak\_scale\_factor is a multiplier that estimates what the QPS could be during peak traffic hours. Lastly, there are 86400 seconds in a day, but 1e5 can be used to make calculation easier. Putting it all together, the formula looks like: QPS = DAU \* DAU\_subset(%) \* content\_rate \* peak\_scale\_factor \* 1e5 Storage is another number to calculate, which will help inform the storage capacity need for the particular feature being designed. Storage can be broken down into DAU, DAU\_subset, content\_subset, content\_size, retention, copies, and days per year. DAU and DAU\_subset are the same as QPS. The content\_size estimates the average content size being stored. Related to content, it is also important to consider what portion of all content contains the content type of interest and partition appropriately. Retention and copies estimate how long the data will be kept and the number of copies that should be kept. Lastly, there are 365 days in a year, but we will round this to 4e2 for easy calculation. Putting it all together, the formula looks like: Storage = DAU \* DAU\_subset(%) \* content\_subset(%) \* content\_size \* retention(yr) \* copies \* 4e2 Having QPS and Storage will help guide the decisions during system design such as the number of servers, database shards, load balancing, and etc.