HyperDex Benchmark, Part 2

Symas Corp., September 2013

This page shows additional performance results of LMDB vs HyperLevelDB as used in HyperDex. It follows on from the work previously reported in August 2013. The same software and hardware are used in these tests.

Setup

First we pick up where we left off from the previous test, using the database with 100 million 4000 byte records. In the following tests we use smaller records of 1000 bytes and then 32 bytes.

Results

100M 4K Records, 10M Ops

The results for LMDB are a little slow to start, due to the database being recreated for this test. The HyperLevelDB results were simply run on the same database as the final results of the August 2013 test. But the overall result clearly shows that LMDB retains its performance lead over HyperLevelDB even over much longer runtimes. In this case, LMDB completed the work in 27.8 hours, vs 33.3 hours for HyperLevelDB. The results for LMDB in this test are somewhat contaminated by a indexer daemon being fired off by cron early on. Unfortunately there wasn't sufficient time to restart the machine and get a clean run.
100M records, 10M ops
MinLatency(us) AvgLatency(us) 95th%ile(ms) 99th%ile(ms) MaxLatency(ms) Runtime(sec) Throughput(ops/sec) CPUtime(mm:ss)
LMDB update 246 45680 197 315 13426 100810 99.2 18:27
LMDB read 214 38942 153 267 13266
LevelDB update 225 48722 172 286 175544 120001 83.3 4412:21
LevelDB read 202 47750 171 284 2940

100M records, 10M ops
	MinLatency(us)	AvgLatency(us)	95th%ile(ms)	99th%ile(ms)	MaxLatency(ms)	Runtime(sec)	Throughput(ops/sec)	CPUtime(mm:ss)
LMDB update	246	45680	197	315	13426	100810	99.2	18:27
LMDB read	214	38942	153	267	13266
LevelDB update	225	48722	172	286	175544	120001	83.3	4412:21
LevelDB read	202	47750	171	284	2940

The raw test output is available in this tar archive. It is also tabulated in this OpenOffice spreadsheet.

100M 1K Records, Sequential Insert

We repeated the tests using 100 million records of 1000 bytes each. In this and the following test using 32 byte records, HyperLevelDB is significantly faster than LMDB. This is as expected; even in the microbench results it was clear that LMDB's write performance only led in two cases - batched sequential writes, and large value writes. With smaller records and non-batched writes the write amplification of LMDB's copy-on-write approach becomes too expensive. The LMDB HyperDex daemon was started with "-s 512000" to set a 500GB map size.

With this record size LMDB is half the speed of HyperLevelDB. Also there's a distinct cliff at around 4550 seconds into the LMDB load, where throughput drops by half. This is most likely the point where the majority of pages no longer fit into RAM.

100M 1K records, Sequential Insert
	MinLatency(us)	AvgLatency(us)	95th%ile(ms)	99th%ile(ms)	MaxLatency(ms)	Runtime(sec)	Throughput(ops/sec)	CPUtime(mm:ss)
LMDB	123	1258	5	17	1960	31577	3167	195:56
LevelDB	128	705	0	15	1603	17740	5637	674:57

100M 1K Records, 10M Ops

HyperLevelDB maintains its lead over LMDB in this test.

100M records, 10M ops
	MinLatency(us)	AvgLatency(us)	95th%ile(ms)	99th%ile(ms)	MaxLatency(ms)	Runtime(sec)	Throughput(ops/sec)	CPUtime (mm:ss)
LMDB update	203	41654	186	310	4611	89531	112	16:29
LMDB read	168	34320	138	242	6391
LevelDB update	195	25430	90	151	64918	63044	159	63:31
LevelDB read	171	25122	90	151	4618

Here's the mdb_stat output for the resulting LMDB database:

Environment Info
  Map address: (nil)
  Map size: 536870912000
  Page size: 4096
  Max pages: 131072000
  Number of pages used: 84119625
  Last transaction ID: 102007830
  Max readers: 126
  Number of readers used: 0
Freelist Status
  Tree depth: 2
  Branch pages: 1
  Leaf pages: 3
  Overflow pages: 0
  Entries: 54
  Free pages: 600
Status of Main DB
  Tree depth: 6
  Branch pages: 1235134
  Leaf pages: 82883885
  Overflow pages: 0
  Entries: 201454920

The 1235134 branch pages consume 5GB, while the 82883885 leaf pages consume 339GB. With the data items being stored inline in the leaf pages instead of in overflow pages, the leaf page volume has grown drastically compared to the test with 4K records. The number of branch pages has also increased proportionately to track all the leaf pages. While it's possible that most of the branch pages will still be RAM-resident, just about all of the leaf pages will require a disk I/O to access.

The raw test output is available in this tar archive. It is also tabulated in this OpenOffice spreadsheet.

500M 32B Records, Sequential Insert

For this loading phase LMDB is significantly faster than HyperLevelDB. The Y-Axis of the Latency graph was switched to a logarithmic scale, otherwise the LMDB latency would have been invisible. CPU times were not recorded in these tests; the trend here hasn't changed.

500M 32B records, Sequential Insert
	MinLatency(us)	AvgLatency(us)	95th%ile(ms)	99th%ile(ms)	MaxLatency(ms)	Runtime(sec)	Throughput(ops/sec)
LMDB	93	264	0	0	1503	33453	14946
LevelDB	86	434	0	9	1631	54814	9122

500M 32B Records, 50M Ops

500M records, 50M ops
	MinLatency(us)	AvgLatency(us)	95th%ile(ms)	99th%ile(ms)	MaxLatency(ms)	Runtime(sec)	Throughput(ops/sec)
LMDB update	143	26382	126	219	12999	274193	182
LMDB read	107	20800	87	168	12973
LevelDB update	111	16125	64	108	44529	206737	242
LevelDB read	106	16216	64	108	1469172

Here's the mdb_stat output for the resulting LMDB database:

Environment Info
  Map address: (nil)
  Map size: 536870912000
  Page size: 4096
  Max pages: 131072000
  Number of pages used: 27554782
  Last transaction ID: 510031188
  Max readers: 126
  Number of readers used: 4
Freelist Status
  Tree depth: 2
  Branch pages: 1
  Leaf pages: 18
  Overflow pages: 0
  Entries: 337
  Free pages: 3423
Status of Main DB
  Tree depth: 5
  Branch pages: 434130
  Leaf pages: 27117208
  Overflow pages: 0
  Entries: 1008466992

The 434130 branch pages consume 1.7GB, while the 27117208 leaf pages consume 111GB. All of the branch pages will be RAM-resident, but most leaf pages will require disk I/O for access.

The raw output for these tests is available in this tar archive. It is also tabulated in this OpenOffice spreadsheet.

Conclusion

Taken together with the August 2013 test results gives a clear picture of where LMDB's strengths and weaknesses lie. For larger records, LMDB's use of overflow pages improves the lookup performance because only keys are stored in the leaf pages, thus keeping them smaller and keeping more keys in RAM. For smaller records, the volume of leaf pages becomes an issue. Also the write amplification of LMDB's copy-on-write strategy has a higher cost with smaller records, since every update still must write a number of whole pages. Additionally, LMDB tends to perform best when the database is not more than 10 times the size of available RAM.

These results also highlight potential areas for future work:

Group Commit could be used to amortize the copy-on-write cost over a number of transactions.
More aggressive use of overflow pages may offset the cost of small-to-medium sized records. Pushing data values into overflow pages will minimize the number of branch and leaf pages, which will improve lookup speeds. To avoid wasting excessive space, a new type of overflow page could be used that accommodates multiple fragments per page.