Symas Corp., February 2015
Some of the DB engines recommend a particular malloc implementation, but it's
not always obvious why a given choice was made. The tests conducted here seek
to reveal the impact of each choice. The tests are performed on our HP DL585 G5
server with 128GB RAM and 4 quad core AMD Opteron 8354 CPUs. Since we're just
interested in how malloc affects the DB performance, this is purely an in-memory
test.
1. Test Overview
For this test we use a database with 2000000 records and 4000 bytes per record,
so roughly 8GB in size. The DB is stored on a tmpfs, which can grow up to 64GB
on this server. We had originally tried to use a larger DB but all of the LSM
engines crashed with disk full errors, even though the data volume was still
much smaller than 64GB. In the current results, Basho LevelDB still filled up
all 64GB of the available space even with just the 8GB data set.
After loading the DB using batched sequential writes, a readwhilewriting is test is run with 1 writer thread and 16 reader threads. None of the threads are constrained to any particular throughput level, and the test runs until all 16 reader threads have each randomly retrieved all 2000000 records.
There was a fair amount of variation between runs, so each DB engine is run 3 times with each malloc library. jemalloc and tcmalloc were invoked using LD_PRELOAD, so the exact same benchmark binaries are used in each test and only the malloc implementation is changed. The results for a given library are averaged across the 3 runs and presented in the table below. The raw data, spreadsheet, and command script are available for download.
Load | Run | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fill | Usr | Sys | % | Wall | RSS | Write | Read | Usr | Sys | % | Wall | RSS | |
ops/sec | sec | sec | KB | ops/sec | ops/sec | sec | sec | KB | |||||
Basho | |||||||||||||
glibc | 37651.33 | 123.56 | 59.38 | 336.33% | 00:54.33 | 127196.00 | #DIV/0! | #DIV/0! | 7085.82 | 1242.23 | 1315.67% | 10:29.33 | 20620677.33 |
jemalloc | 35889.33 | 123.61 | 67.61 | 335.33% | 00:56.93 | 122224.00 | #DIV/0! | #DIV/0! | 6212.08 | 1943.22 | 1496.00% | 09:04.87 | 19632088.00 |
tcmalloc | 36421.33 | 130.58 | 58.37 | 336.00% | 00:56.17 | 134229.33 | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! |
BDB | |||||||||||||
glibc | 20526.33 | 69.83 | 30.28 | 99.00% | 01:40.21 | 9309362.67 | 2676.33 | 54408.33 | 1841.69 | 7259.20 | 1543.67% | 09:49.27 | 9033684.00 |
jemalloc | 20236.33 | 71.35 | 30.18 | 99.00% | 01:41.62 | 9310096.00 | 2707.33 | 55896.00 | 1817.45 | 7030.72 | 1542.00% | 09:33.51 | 9035160.00 |
tcmalloc | 19277.67 | 76.08 | 30.41 | 99.00% | 01:46.59 | 9311358.67 | 2643.33 | 57031.67 | 1838.08 | 6835.63 | 1542.33% | 09:22.20 | 9036669.33 |
Hyper | |||||||||||||
glibc | 44562.33 | 46.93 | 26.85 | 161.00% | 00:45.71 | 678653.33 | 18881.00 | 148509.33 | 19794.44 | 559.25 | 1369.67% | 03:40.34 | 17911013.33 |
jemalloc | 44419.00 | 47.07 | 26.84 | 161.00% | 00:45.80 | 673344.00 | 19595.33 | 148247.33 | 2432.61 | 580.12 | 1367.33% | 03:40.25 | 17656761.33 |
tcmalloc | 43468.00 | 48.85 | 26.87 | 160.67% | 00:46.95 | 685753.33 | 18919.33 | 141846.00 | 2628.05 | 531.19 | 1370.00% | 03:50.51 | 17928325.33 |
LevelDB | |||||||||||||
glibc | 54178.00 | 46.90 | 17.73 | 173.33% | 00:37.22 | 516938.67 | 7952.33 | 85805.33 | 3529.82 | 1326.26 | 1287.67% | 06:17.09 | 11900356.00 |
jemalloc | 46198.67 | 49.36 | 23.88 | 167.67% | 00:43.54 | 518184.00 | 8449.67 | 98171.33 | 3088.34 | 1260.91 | 1314.33% | 05:30.83 | 11326717.33 |
tcmalloc | 51751.00 | 50.88 | 16.81 | 173.00% | 00:39.03 | 532433.33 | 8457.00 | 95788.33 | 3271.56 | 1135.35 | 1301.00% | 05:38.51 | 11995682.67 |
LMDB | |||||||||||||
glibc | 164342.33 | 5.64 | 7.08 | 99.00% | 00:12.75 | 8070749.33 | 33403.33 | 3028569.67 | 145.61 | 2.22 | 1300.33% | 00:11.36 | 8093228.00 |
jemalloc | 162537.00 | 5.68 | 7.17 | 99.00% | 00:12.88 | 8071140.00 | 33671.67 | 3132727.33 | 147.00 | 2.20 | 1356.67% | 00:11.01 | 8092724.00 |
tcmalloc | 160763.00 | 5.88 | 7.14 | 99.00% | 00:13.05 | 8076445.33 | 32289.00 | 3064187.67 | 149.17 | 2.17 | 1344.67% | 00:11.25 | 8100145.33 |
RocksDB | |||||||||||||
glibc | 57961.67 | 45.14 | 18.18 | 181.67% | 00:34.74 | 607101.33 | 13910.67 | 34164.33 | 10855.35 | 3465.73 | 1521.67% | 15:40.77 | 2258294.67 |
jemalloc | 52566.67 | 44.84 | 23.70 | 179.00% | 00:38.16 | 562550.67 | 14762.67 | 41509.33 | 8797.22 | 3117.37 | 1535.67% | 12:55.49 | 1248504.00 |
tcmalloc | 58078.67 | 46.81 | 17.57 | 185.00% | 00:34.71 | 626377.33 | 11151.33 | 40718.67 | 9193.03 | 3028.43 | 1547.00% | 13:09.76 | 1296916.00 |
RocksDB2 | |||||||||||||
glibc | 61744.00 | 47.43 | 19.94 | 204.00% | 00:32.90 | 544488.00 | 15872.67 | 118042.33 | 3122.98 | 1030.11 | 1516.33% | 04:33.86 | 1467462.67 |
jemalloc | 55608.33 | 42.74 | 23.86 | 183.67% | 00:36.16 | 402370.67 | 15917.33 | 123310.00 | 2872.10 | 1121.72 | 1518.33% | 04:22.91 | 1086408.00 |
tcmalloc | 62242.33 | 50.44 | 19.34 | 213.67% | 00:32.62 | 567416.00 | 15692.67 | 126792.67 | 2928.49 | 968.21 | 1526.00% | 04:15.27 | 1102588.00 |
TokuDB | |||||||||||||
glibc | 29050.33 | 86.38 | 112.60 | 122.67% | 02:41.54 | 9254672.00 | 2028.33 | 29545.00 | 6367.58 | 3116.99 | 857.33% | 18:25.21 | 15980325.33 |
jemalloc | 29038.00 | 98.09 | 89.84 | 131.00% | 02:22.96 | 8350152.00 | 3812.67 | 42467.00 | 6578.44 | 3138.79 | 1273.00% | 12:43.10 | 9109910.67 |
tcmalloc | 36285.67 | 124.51 | 84.64 | 239.33% | 01:27.27 | 10562993.33 | 2827.67 | 45894.00 | 5112.53 | 1837.14 | 973.00% | 11:56.44 | 14282605.33 |
WiredBtree | |||||||||||||
glibc | 135670.00 | 7.13 | 7.62 | 99.00% | 00:14.77 | 6277.33 | 12166.67 | 515619.33 | 322.12 | 462.40 | 1209.67% | 01:04.83 | 793792.00 |
jemalloc | 135946.67 | 7.33 | 7.39 | 99.00% | 00:14.74 | 6461.33 | 13368.33 | 546248.33 | 313.83 | 437.27 | 1208.67% | 01:02.11 | 585908.00 |
tcmalloc | 123948.00 | 8.33 | 7.87 | 99.00% | 00:16.23 | 8880.00 | 12948.00 | 532682.00 | 307.50 | 448.76 | 1254.67% | 01:00.24 | 563261.33 |
WiredLSM | |||||||||||||
glibc | 34639.00 | 96.27 | 38.24 | 228.67% | 00:58.70 | 1989202.67 | 25964.67 | 262389.33 | 1752.64 | 83.63 | 1475.67% | 02:04.40 | 20176804.00 |
jemalloc | 25730.33 | 108.29 | 61.66 | 215.67% | 01:18.60 | 1557186.67 | 13271.67 | 345567.67 | 1380.92 | 57.90 | 1516.33% | 01:34.83 | 13175570.67 |
tcmalloc | 33913.33 | 103.04 | 39.16 | 237.67% | 00:59.80 | 4079472.00 | 27161.33 | 260150.00 | 1807.30 | 82.39 | 1504.67% | 02:05.55 | 21171470.67 |
A number of engines appeared slower during their initial DB load when using non-standard malloc. For example, BerkeleyDB's load throughput was 7% slower using tcmalloc. LevelDB's load throughput was 15% slower using jemalloc. RocksDB was 10% slower using jemalloc. RocksDB with optimized settings was also 10% slower using jemalloc. WiredTiger's LSM was 25% slower using jemalloc. Without any solid explanation, one guess is that the DB load is primarily a single-threaded job, and the optimizations that these other malloc libraries have made to support multi-threaded workloads turns into a pessimization here. Of course, most of these engines are also doing a lot of work in background threads during the DB load, so it's not strictly a single-threaded workload. tcmalloc also generally uses more RAM in this phase.
In the readwhilewriting phase, the trend generally reverses and the non-standard mallocs tend to deliver faster throughput (with some exceptions of course). E.g. LevelDB gets 13% faster throughput using jemalloc than standard glibc malloc. RocksDB gets 18% faster with jemalloc. TokuDB gets a whole 31% faster with jemalloc and 35% faster with tcmalloc. TokuDB also uses 43% less memory with jemalloc. This result makes TokuDB's choice of jemalloc pretty understandable.
Unfortunately the results don't present a clear-cut best choice. For some engines and workloads
tcmalloc is fastest, for some plain glibc is best. tcmalloc sometimes has excessive RAM usage
and jemalloc generally is more compact, but not always.
Conclusion
The magnitude of the gains with non-standard mallocs is pretty remarkable, but in the grand
scheme of things it doesn't amount to much; all of the affected DB engines are still an
order of magnitude slower than LMDB. To quote Tim Callaghan
it's not just
the library, implementation counts.
When your DB engine uses 20GB of RAM to manage an 8GB database, you've done something seriously wrong. When your DB engine uses malloc so extensively that swapping out malloc libraries makes over a 30% difference in performance, you've done something seriously wrong. Yes, implementation counts. The most efficient memory allocation is the one you didn't have to make.
The command script: cmd3, raw output: out.mallocs.tgz, and LibreOffice spreadsheet DBmallocs.ods.
The source code for the benchmark drivers is all on GitHub. We invite you to run these tests yourself and report your results back to us.
The software versions we used:
Software revisions used: violino:/home/software/leveldb> g++ --version g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. violino:/home/software/leveldb> git log -1 --pretty=format:"%H %ci" master e353fbc7ea81f12a5694991b708f8f45343594b1 2014-05-01 13:44:03 -0700 violino:/home/software/basho_leveldb> git log -1 --pretty=format:"%H %ci" develop d1a95db0418d4e17223504849b9823bba160dfaa 2014-08-21 15:41:50 -0400 violino:/home/software/db-5.3.21> ls -l README -rw-r--r-- 1 hyc hyc 234 May 11 2012 README violino:/home/software/HyperLevelDB> git log -1 --pretty=format:"%H %ci" master 02ad33ccecc762fc611cc47b26a51bf8e023b92e 2014-08-20 16:44:03 -0400 violino:~/OD/mdb> git log -1 --pretty=format:"%H %ci" a054a194e8a0aadfac138fa441c8f67f5d7caa35 2014-08-24 21:18:03 +0100 violino:/home/software/rocksdb> git log -1 --pretty=format:"%H %ci" 7e9f28cb232248b58f22545733169137a907a97f 2014-08-29 21:21:49 -0700 violino:/home/software/ft-index> git log -1 --pretty=format:"%H %ci" master f17aaee73d14948962cc5dea7713d95800399e65 2014-08-30 06:35:59 -0400 violino:/home/software/wiredtiger> git log -1 --pretty=format:"%H %ci" 1831ce607baf61939ddede382ee27e193fa1bbef 2014-08-14 12:31:38 +1000All of the engines were built with compression disabled. We will compare compression engines in an upcoming test.