· 6 years ago · Jan 29, 2020, 04:44 AM
1PG
2
3EXPLORE
4
5TRACKS
6
7MY COURSES
8
9EDPRESSO
10
11REFER A FRIEND
12
13CREATE
14Grokking the System Design Interview
1574% COMPLETEDReset
16Search Course
17System Design Problems
18System Design Interviews: A step by step guide
19Designing a URL Shortening service like TinyURL
20Designing Pastebin
21Designing Instagram
22Designing Dropbox
23Designing Facebook Messenger
24Designing Twitter
25Designing Youtube or Netflix
26Designing Typeahead Suggestion
27Designing an API Rate Limiter
28Designing Twitter Search
29Designing a Web Crawler
30Designing Facebook’s Newsfeed
31Designing Yelp or Nearby Friends
32Designing Uber backend
33Design Ticketmaster (*New*)
34Additional Resources
35Glossary of System Design Basics
36System Design Basics
37Key Characteristics of Distributed Systems
38Load Balancing
39Caching
40Data Partitioning
41Indexes
42Proxies
43Redundancy and Replication
44SQL vs. NoSQL
45CAP Theorem
46Consistent Hashing
47Long-Polling vs WebSockets vs Server-Sent Events
48Appendix
49Contact Us
50Other courses
51Mark Course as Completed
52
53Designing Pastebin
54Let's design a Pastebin like web service, where users can store plain text. Users of the service will enter a piece of text and get a randomly generated URL to access it.
55
56Similar Services: pastebin.com, pasted.co, chopapp.com
57Difficulty Level: Easy
58
591. What is Pastebin?
60Pastebin like services enable users to store plain text or images over the network (typically the Internet) and generate unique URLs to access the uploaded data. Such services are also used to share data over the network quickly, as users would just need to pass the URL to let other users see it.
61
62If you haven’t used pastebin.com before, please try creating a new ‘Paste’ there and spend some time going through the different options their service offers. This will help you a lot in understanding this chapter.
63
642. Requirements and Goals of the System
65Our Pastebin service should meet the following requirements:
66
67Functional Requirements:
68
69Users should be able to upload or “paste” their data and get a unique URL to access it.
70Users will only be able to upload text.
71Data and links will expire after a specific timespan automatically; users should also be able to specify expiration time.
72Users should optionally be able to pick a custom alias for their paste.
73Non-Functional Requirements:
74
75The system should be highly reliable, any data uploaded should not be lost.
76The system should be highly available. This is required because if our service is down, users will not be able to access their Pastes.
77Users should be able to access their Pastes in real-time with minimum latency.
78Paste links should not be guessable (not predictable).
79Extended Requirements:
80
81Analytics, e.g., how many times a paste was accessed?
82Our service should also be accessible through REST APIs by other services.
833. Some Design Considerations
84Pastebin shares some requirements with URL Shortening service, but there are some additional design considerations we should keep in mind.
85
86What should be the limit on the amount of text user can paste at a time? We can limit users not to have Pastes bigger than 10MB to stop the abuse of the service.
87
88Should we impose size limits on custom URLs? Since our service supports custom URLs, users can pick any URL that they like, but providing a custom URL is not mandatory. However, it is reasonable (and often desirable) to impose a size limit on custom URLs, so that we have a consistent URL database.
89
904. Capacity Estimation and Constraints
91Our services will be read-heavy; there will be more read requests compared to new Pastes creation. We can assume a 5:1 ratio between read and write.
92
93Traffic estimates: Pastebin services are not expected to have traffic similar to Twitter or Facebook, let’s assume here that we get one million new pastes added to our system every day. This leaves us with five million reads per day.
94
95New Pastes per second:
96
971M / (24 hours * 3600 seconds) ~= 12 pastes/sec
98Paste reads per second:
99
1005M / (24 hours * 3600 seconds) ~= 58 reads/sec
101Storage estimates: Users can upload maximum 10MB of data; commonly Pastebin like services are used to share source code, configs or logs. Such texts are not huge, so let’s assume that each paste on average contains 10KB.
102
103At this rate, we will be storing 10GB of data per day.
104
1051M * 10KB => 10 GB/day
106If we want to store this data for ten years we would need the total storage capacity of 36TB.
107
108With 1M pastes every day we will have 3.6 billion Pastes in 10 years. We need to generate and store keys to uniquely identify these pastes. If we use base64 encoding ([A-Z, a-z, 0-9, ., -]) we would need six letters strings:
109
11064^6 ~= 68.7 billion unique strings
111If it takes one byte to store one character, total size required to store 3.6B keys would be:
112
1133.6B * 6 => 22 GB
11422GB is negligible compared to 36TB. To keep some margin, we will assume a 70% capacity model (meaning we don’t want to use more than 70% of our total storage capacity at any point), which raises our storage needs to 51.4TB.
115
116Bandwidth estimates: For write requests, we expect 12 new pastes per second, resulting in 120KB of ingress per second.
117
11812 * 10KB => 120 KB/s
119As for the read request, we expect 58 requests per second. Therefore, total data egress (sent to users) will be 0.6 MB/s.
120
12158 * 10KB => 0.6 MB/s
122Although total ingress and egress are not big, we should keep these numbers in mind while designing our service.
123
124Memory estimates: We can cache some of the hot pastes that are frequently accessed. Following the 80-20 rule, meaning 20% of hot pastes generate 80% of traffic, we would like to cache these 20% pastes
125
126Since we have 5M read requests per day, to cache 20% of these requests, we would need:
127
1280.2 * 5M * 10KB ~= 10 GB
1295. System APIs
130We can have SOAP or REST APIs to expose the functionality of our service. Following could be the definitions of the APIs to create/retrieve/delete Pastes:
131
132addPaste(api_dev_key, paste_data, custom_url=None user_name=None, paste_name=None, expire_date=None)
133Parameters:
134api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
135paste_data (string): Textual data of the paste.
136custom_url (string): Optional custom URL.
137user_name (string): Optional user name to be used to generate URL.
138paste_name (string): Optional name of the paste
139expire_date (string): Optional expiration date for the paste.
140
141Returns: (string)
142A successful insertion returns the URL through which the paste can be accessed, otherwise, it will return an error code.
143
144Similarly, we can have retrieve and delete Paste APIs:
145
146getPaste(api_dev_key, api_paste_key)
147Where “api_paste_key” is a string representing the Paste Key of the paste to be retrieved. This API will return the textual data of the paste.
148
149deletePaste(api_dev_key, api_paste_key)
150A successful deletion returns ‘true’, otherwise returns ‘false’.
151
1526. Database Design
153A few observations about the nature of the data we are storing:
154
155We need to store billions of records.
156Each metadata object we are storing would be small (less than 1KB).
157Each paste object we are storing can be of medium size (it can be a few MB).
158There are no relationships between records, except if we want to store which user created what Paste.
159Our service is read-heavy.
160Database Schema:
161We would need two tables, one for storing information about the Pastes and the other for users’ data.
162
163widget
164Here, ‘URlHash’ is the URL equivalent of the TinyURL and ‘ContentKey’ is a reference to an external object storing the contents of the paste; we’ll discuss the external storage of the paste contents later in the chapter.
165
1667. High Level Design
167At a high level, we need an application layer that will serve all the read and write requests. Application layer will talk to a storage layer to store and retrieve data. We can segregate our storage layer with one database storing metadata related to each paste, users, etc., while the other storing the paste contents in some object storage (like Amazon S3). This division of data will also allow us to scale them individually.
168
169widget
1708. Component Design
171a. Application layer
172Our application layer will process all incoming and outgoing requests. The application servers will be talking to the backend data store components to serve the requests.
173
174How to handle a write request? Upon receiving a write request, our application server will generate a six-letter random string, which would serve as the key of the paste (if the user has not provided a custom key). The application server will then store the contents of the paste and the generated key in the database. After the successful insertion, the server can return the key to the user. One possible problem here could be that the insertion fails because of a duplicate key. Since we are generating a random key, there is a possibility that the newly generated key could match an existing one. In that case, we should regenerate a new key and try again. We should keep retrying until we don’t see failure due to the duplicate key. We should return an error to the user if the custom key they have provided is already present in our database.
175
176Another solution of the above problem could be to run a standalone Key Generation Service (KGS) that generates random six letters strings beforehand and stores them in a database (let’s call it key-DB). Whenever we want to store a new paste, we will just take one of the already generated keys and use it. This approach will make things quite simple and fast since we will not be worrying about duplications or collisions. KGS will make sure all the keys inserted in key-DB are unique. KGS can use two tables to store keys, one for keys that are not used yet and one for all the used keys. As soon as KGS gives some keys to an application server, it can move these to the used keys table. KGS can always keep some keys in memory so that whenever a server needs them, it can quickly provide them. As soon as KGS loads some keys in memory, it can move them to the used keys table, this way we can make sure each server gets unique keys. If KGS dies before using all the keys loaded in memory, we will be wasting those keys. We can ignore these keys given that we have a huge number of them.
177
178Isn’t KGS a single point of failure? Yes, it is. To solve this, we can have a standby replica of KGS and whenever the primary server dies it can take over to generate and provide keys.
179
180Can each app server cache some keys from key-DB? Yes, this can surely speed things up. Although in this case, if the application server dies before consuming all the keys, we will end up losing those keys. This could be acceptable since we have 68B unique six letters keys, which are a lot more than we require.
181
182How does it handle a paste read request? Upon receiving a read paste request, the application service layer contacts the datastore. The datastore searches for the key, and if it is found, returns the paste’s contents. Otherwise, an error code is returned.
183
184b. Datastore layer
185We can divide our datastore layer into two:
186
187Metadata database: We can use a relational database like MySQL or a Distributed Key-Value store like Dynamo or Cassandra.
188Object storage: We can store our contents in an Object Storage like Amazon’s S3. Whenever we feel like hitting our full capacity on content storage, we can easily increase it by adding more servers.
189widget
190Detailed component design for Pastebin
1919. Purging or DB Cleanup
192Please see Designing a URL Shortening service.
193
19410. Data Partitioning and Replication
195Please see Designing a URL Shortening service.
196
19711. Cache and Load Balancer
198Please see Designing a URL Shortening service.
199
20012. Security and Permissions
201Please see Designing a URL Shortening service.
202
203Completed
204← Back
205Designing a URL Shortening service like TinyURL
206Next →
207Designing Instagram
208Stuck? Get help on
209DISCUSS
210Send feedback
21136 Recommendations