Symas Corp., August 2013
We used HyperLevelDB rev 0e4446225cd99942ce452973663b41d100d0730b from Github and LMDB rev 2cc2574d84686d2e2556e86f78a962bd593af19c from Gitorious.
Also we're using the LMDB version of Replicant for the HyperDex coordinator in all the tests. This version is available on Github. This is only mentioned for completeness, as the HyperDex developers have assured us that there is no performance impact from the Replicant.
Ordinarily we benchmark using either our 16-core 128GB RAM server or our 64-core 512GB RAM server but NoSQL folks tend to be more interested in running on dinky little boxes with much less memory. For this test the data node is my old Dell Precision M4400 laptop with quad-core Intel Q9300 CPU @ 2.53GHz and 8GB DDR2 DRAM. There are two sets of tests being performed, one with 10 million records, using a Crucial M4 512GB SSD on a reiserfs partition. (This is the system disk of the laptop, so it is about 50% full and has been steadily used for several months now.) At 4000 bytes per record the resulting database is around 40GB. A second test is performed with 100 million records. In this case, there wasn't enough free space on the SSD so a Seagate ST9500420AS HDD with XFS partition was used instead.
The YCSB load generator and the HyperDex coordinator/Replicant are running on an Asus N56DP laptop with quad-core AMD A10-4600M APU and 16GB DDR3 DRAM. The load generator never gets anywhere near stressing out this machine. The laptops are plugged into Gbit ethernet through a TP-Link WR-1043ND wifi router, and there is no other traffic on the network. The data node is booted in single-user mode so no other services are running on the machine.
The basic YCSB setup is copied from this mapkeeper benchmark. The main difference from their workload is that we only configure 4 threads instead of their 100 threads. Also we used more records and more operations in each test. Our workload file is provided for reference.
The HyperDex setup is copied from this HyperDex benchmark. We only create 4 partitions, instead of the 24 they used. There is no fault tolerance since we have only one data node, but that's not relevant to this test.
LMDB gives consistent, fast response while HyperLevelDB's response is wildly unpredictable. At its peak speed it is only 80% as fast as LMDB, and overall LMDB completes the load more than two times faster than HyperLevelDB. HyperLevelDB also uses more than five times as much CPU as LMDB. To summarize:
MinLatency(us) | AvgLatency(us) | 95th%ile(ms) | 99th%ile(ms) | MaxLatency(ms) | Runtime(sec) | Throughput(ops/sec) | CPUtime(mm:ss) | |
---|---|---|---|---|---|---|---|---|
LMDB | 183 | 419 | 0 | 0 | 2565 | 1058 | 9449 | 19:44.52 |
LevelDB | 189 | 907 | 0 | 23 | 3925 | 2279 | 4388 | 96:46.96 |
MinLatency(us) | AvgLatency(us) | 95th%ile(ms) | 99th%ile(ms) | MaxLatency(ms) | Runtime(sec) | Throughput(ops/sec) | CPUtime(mm:ss) | |
---|---|---|---|---|---|---|---|---|
LMDB update | 225 | 543 | 0 | 1 | 18 | 143 | 6973 | 1:33.17 |
LMDB read | 163 | 570 | 0 | 1 | 20 | |||
LevelDB update | 227 | 1163 | 2 | 3 | 147 | 307 | 3256 | 3:37.67 |
LevelDB read | 191 | 1228 | 2 | 3 | 206 |
Here's the mdb_stat output for the resulting LMDB database:
Environment Info Map address: (nil) Map size: 53687091200 Page size: 4096 Max pages: 13107200 Number of pages used: 10363447 Last transaction ID: 10220428 Max readers: 126 Number of readers used: 1 Freelist Status Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 7 Free pages: 45 Status of Main DB Tree depth: 4 Branch pages: 5733 Leaf pages: 357666 Overflow pages: 10000000 Entries: 20210850There are 10,000,000 overflow pages, which makes sense since the records are 4000 bytes each. One record occupies one overflow page. The 5733 branch pages consume 22.9MB, while the 357666 leaf pages consume 1.43GB. Thus, even though the total database is 5 times larger than RAM, all of the key lookups can be memory resident, and at most one disk I/O is needed to retrieve a record's data. There are over 20 million items in the DB; HyperDex stores additional information per user record. The additional data is inconsequential here.
The raw test output is available in this tar archive. It is also tabulated in this OpenOffice spreadsheet.
We have heard criticisms of earlier LMDB tests because their runtimes were considered "too short." The fact is, other DBs need to run for tens of hours to complete the same amount of useful work LMDB accomplishes in only a few hours.
The difference in efficiency is enormous. Using LevelDB for a workload means using 3-4x as much CPU and far more disk bandwidth, resulting in higher electricity consumption, greater wear and tear on storage systems, and earlier death/wear-out of storage devices. Even though HyperLevelDB improves on LevelDB's write concurrency and compaction efficiency, it is still far less efficient than LMDB, using almost 15 times as much CPU in this task.
MinLatency(us) | AvgLatency(us) | 95th%ile(ms) | 99th%ile(ms) | MaxLatency(ms) | Runtime(sec) | Throughput(ops/sec) | CPUtime(mm:ss) | |
---|---|---|---|---|---|---|---|---|
LMDB | 173 | 544 | 0 | 0 | 1439 | 13702 | 7298 | 227:26 |
LevelDB | 199 | 2737 | 4 | 11 | 17573 | 68697 | 1456 | 3373:13 |
MinLatency(us) | AvgLatency(us) | 95th%ile(ms) | 99th%ile(ms) | MaxLatency(ms) | Runtime(sec) | Throughput(ops/sec) | CPUtime (mm:ss) | |
---|---|---|---|---|---|---|---|---|
LMDB update | 256 | 33565 | 130 | 209 | 637 | 8385 | 119 | 4:21.41 |
LMDB read | 215 | 33493 | 130 | 207 | 817 | |||
LevelDB update | 241 | 63660 | 231 | 385 | 20370 | 15863 | 63 | 17:27 |
LevelDB read | 188 | 63250 | 230 | 383 | 7904 |
Here's the mdb_stat output for the resulting LMDB database:
Environment Info Map address: (nil) Map size: 536870912000 Page size: 4096 Max pages: 131072000 Number of pages used: 103667712 Last transaction ID: 100481421 Max readers: 126 Number of readers used: 0 Freelist Status Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 18 Free pages: 145 Status of Main DB Tree depth: 5 Branch pages: 58461 Leaf pages: 3609103 Overflow pages: 100000000 Entries: 200269469There are 100,000,000 overflow pages, which is as expected. The 58461 branch pages consume 233.8MB, while the 3609103 leaf pages consume 14.4GB. Now that the total database is 50 times larger than RAM, around half of the key lookups will require a disk I/O. Also another disk I/O will be needed for the record data. According to this hard drive review, a 4KB I/O on this drive will take 16ms on average. At two I/Os per DB request, we see the result is quite close to the 33ms average latency that LMDB delivers here.
It's not so simple to analyze HyperLevelDB's I/O load since the DB design is so byzantine, manipulating multiple files and making massive copies of data from one file to another as it attempts to merge one level of tables into the next. But the numbers speak for themselves: LMDB's simple design is still most efficient, getting the required work done with a minimum of CPU use, disk bandwidth use, and minimum of real time.
The raw test output is available in this tar archive. It is also tabulated in this OpenOffice spreadsheet.