· 7 years ago · Oct 03, 2018, 01:32 AM
1There are two primary data actions: store it, and later, retrieve it. Above that, we apply some structure to the data. There are mainly two ways of doing this:
2
3A relational database management system (RDBMS), or "SQL data."
4A non-relational database, or "NoSQL data."
5While data can be stored in many different ways, we need to effectively organize the data in order to search for it and access it. In the case of SQL and NoSQL, both solutions build special data structures called "indexes." The data structure chosen often helps to determine the performance characteristics of the store and retrieve commands.
6B-Trees
7A traditional and widely-used data structure is called "B-tree." B-tree structures are a standard part of computer science textbooks and are used in most (if not all) RDBMS products.
8
9B-tree data structures' performance characteristics are well understood. In general, all operations perform well when the data size fits into the available memory. (By memory, I mean the amount of RAM accessible by the RDBMS in either the physical server or virtual server.) This memory limit is usually a hard restriction. Below is a general chart I like to use to demonstrate B-tree performance characteristics.
10
11Image title
12
13This chart clearly illustrates a couple of points:
14
15As soon as the data size exceeds available memory, performance drops rapidly.
16Choosing a flash-based storage helps performance, but only to a certain extent—memory limits still cause performance to suffer.
17B-tree-based structures were designed for optimal data retrieval performance, not data storage. This shortcoming created a need for data structures that provide better performance for data storage. So, when is B-tree a good solution for your applications? The chart above provides clues:
18
19When data size doesn’t exceed memory limits
20When the application is mostly performing read (SELECT) operations
21When read performance is more important than write performance
22Events that might exceed B-tree performance limits include: accepting and storing event logs; storing measurements from a high-frequency sensor; tracking user clicks; and so on.
23
24In a majority of cases, it is possible to solve B-tree performance issues with more memory or faster physical storage (see previous chart). But when hardware adjustments aren’t an option, a different data structure can help.
25
26Two new data structures were created for write-intensive environments: log structured merge (LSM) trees and Fractal Trees®. These structures focus on data storage performance rather than data retrieval.
27
28Image title
29
30(Keep in mind that the graph shows asymptotic theoretical trends. Real performance graphs vary hugely with the specific implementation and software involved.)
31
32Before going into LSM and Fractal Tree details, let’s discuss the "key-value" concept.
33
34In any storage structure, data can be presented in a key => value format. This is familiar to NoSQL users, but probably not to RDBMS users. RDBMSes instead use primary keys associated with tables, and the data is internally represented as primary_key => table_columns.
35
36For example, a user account may need the following data:
37
38(user_id; user_name; user_email; user_address;
39account_name; user_zip; user_year_of_birth)
40This would be represented (using a primary key) as:
41
42user_id => user_name; user_email; user_address;
43account_name; user_zip; user_year_of_bi
44Any of the following could be used as a secondary index:
45
46by email (for fast search by email): (user_email => user_id)
47by year of birth: (user_year_of_birth => user_id)
48by postal code: (user_zip => user_id)
49Searching for a user_name by user_email would be done in two steps:
50
51search user_id by user_email
52search user_name by user_id
53An "insert" operation to this table could look like this:
54
55user_id: 1000; user_name: "Sherlock Holmes"; user_email:
56"sherlock@holmes.guru"; user_address: "221B Baker Street";
57account_name: "sherlockh"; user_zip: "NW1 6XE"; user_year_
58of_birth: 1854
59The transactions for this operation would be:
60
61Insert into primary data storage (or primary key):
62(1000 => “Sherlock Holmesâ€; “sherlock@holmes. guruâ€; “221B Baker Streetâ€; “sherlockhâ€;
63“NW1 6XEâ€, 1854)
64Insert into email index: (“sherlock@holmes.guru†=> 1000)
65Insert into year_of _birth index (1854 => 1000)
66Insert into postal code index (“NW1 6XE†=> 1000)
67After the initial insert operation, there would be three additional operations behind the scenes (known as "index maintenance" overhead). This overhead can contribute to performance degradation.
68
69Let’s say we want to update an email record for a user (user_id: 2000; new email: "newm@example.com"). The following transactions would occur:
70
71Primary data storage: find user_id:2000; read email to old_email; rewrite email to "newm@example.com"
72Email index:
73 A. Find record with key = old_email; delete it
74
75 B. Insert record (“newm@example.com†=> 2000)
76
77Sequential keys (or monotonically increasing functions) generally don’t cause problems for B-tree structures—it’s random operations that cause performance hits. An email address is a good example of a random insertion.
78
79So random operations make B-trees problematic, performance-wise, due to hardware limitations—random "modify" operations cause multiple disk IOs.
80
81Both LSM and Fractal Trees attempt to improve performance by making key operations less random. These data structures also provide better compression and smaller write amplification, which is better for flash/solid-state storage.
82
83LSM Trees
84The first mention of LSM trees dates back to 1996, and corresponds to Google BigTables. Later it was implemented in products such as Cassandra, LevelDB, and most recently in RocksDB.
85
86An LSM tree works by:
87
88Storing incoming modify operations in a buffer (usually named "memtable")
89Sorting and storing the data when the buffer is full
90What does this look like? Using our previous examples, let’s assume we have the following users registered:
91
92(user_id: 5000; user_email: "aku.m@rnplplf.com")
93(user_id: 5001; user_email: "3lca4g3eaagucf7@kl5u558kg. com")
94(user_id: 5002; user_email: "xz3uhs7@irvoizpi70.com")
95(user_id: 5003; user_email: "6wmyfg@qbqwnb.com")
96(user_id: 5004; user_email: "63hkw@p2f505jh1hr1.com)
97After being sorted and written to disk, the email index looks something like:
98
99(key=>value)
1003lca4g3eaagucf7@kl5u558kg.com => 5001
10163hkw@p2f505jh1hr1.com => 5004
1026wmyfg@qbqwnb.com => 5003
103aku.m@rnplplf.com => 5000
104xz3uhs7@irvoizpi70.com => 5002
105This results in the entire buffer being written to memory in one sequential operation:
106
107Image title
108
109That’s the benefit, but what are the drawbacks?
110
111As we continue to insert users and write to the disk, LSM creates an increasing number of "SST files." Each of these files are sorted, and there is no global order. Moreover, the same key (for non-unique indexes) can end up in different files. The following diagram illustrates how SST files for "Year_of_birth" indexes might look:
112
113Image title
114
115This organization makes searching an individual file fast, but searching globally slow. For example, if we want to find the "user_id" for a user with email w7hl@125msxuyf7.com, we would need to look in each file individually.
116
117This presents two problems:
118
119Searching data by an individual key
120Searching data by a range of keys (e.g., all users with "year_of_ birth" between 1970 and 1990)
121Image title
122
123In order to address SST files’ distributed nature, production software often implements different maintenance logic:
124
125File compaction: merging files into one
126File levels: making file hierarchies, to avoid checking each file for an existing key
127Bloomfilters: helps lookup individual keys faster (but doesn’t help with ranges)
128Fractal Tree
129Fractal Tree data structures are closer to traditional B-tree structures—but instead of applying changes immediately, changes are buffered. As information exceeds the limits of the main index memory, the tree data structure buffers large groups of messages. The buffered data is slowly pushed down the tree as the buffers fill up. When data gets to a leaf node, there is a single IO applied to the data. This helps avoid random operations causing performance degradation by performing buffer changes all at once.
130
131Data compression reduces read IO further.
132
133The following diagram demonstrates the buffering process:
134
135Image title
136
137By combining all writes, Fractal Trees save time by performing a single transaction rather than a number of random ones. However, because a huge number of messages reside in the buffer, SELECT functions now must traverse through all the messages in order to find the correct one (and this is especially bad for point SELECT queries).
138
139Remember: Primary key or unique key constraints require a HIDDEN POINT SELECT lookup! This means that both a UNIQUE KEY and a non-sequence PRIMARY KEY are performance killers for Fractal Tree data structures.
140
141Fractal Trees are a good structure for databases with a lot of tables, indexes (preferably non-unique indexes), and a heavy write workload. It is also good for systems with slow storage times, or for saving space when the storage is fast but expensive.
142
143Lastly, this is often a good fit for cloud-based database environments, where again storage is often slow (or if fast, expensive).
144
145Implication for Slow Reads
146Unfortunately for performance, LSM and Fractal Trees are less friendly for read operations.
147
148Direct and implicit read operations are slower for LSM and Fractal Tree structures. Things like unique key constraints make insert and update transactions slower (because the background data is checked to see if the value exists).
149
150Foreign key constraints will also slow down insert and update transactions on corresponding tables. Some schemas don’t support foreign keys (or unique keys, for that matter).
151
152Finally, select index and join operation transactions can also be affected. One way around this issue is to use covering indexes. A covering index contains all of or more of the columns you need for your query.
153
154For example, let’s assume you want to execute query:
155
156SELECT user_name FROM users WHERE user_email='sherlock@
157holmes.guru'
158or in MongoDB notation:
159
160db.users.find( {user_email: "sherlock@holmes.guru" } ,
161{user_name : 1} )
162The index { user_email => user_id } results in two operations:
163
164Lookup user_id by user_email
165Lookup user_name by user_id
166One solution instead is to create an index (in SQL syntax):
167
168CREATE INDEX idx_email (user_email, user_name)
169This query:
170
171SELECT user_name FROM users WHERE user_email='sherlock@
172holmes.guru'
173can now be resolved by accessing the index idx_email, as "user_name" is now the part of the index.
174
175This "trick" can also be used with B-trees, but it works best with LSM and Fractal Trees for additional index overhead.
176
177Conclusion
178As we’ve discussed, the three data structure that can be used (B-tree, LSM tree, and Fractal Tree) can affect data performance in relation to applications. Using a database system based on one of these algorithms can affect the way your queries perform. The storage method affects the data performance—and data performance is a key component to application performance. Your business depends on application performance, and how your customers view how well your applications respond.