In-Memory Microbenchmark (Scaling)

Symas Corp., September 2014


Continuing on from before, this RocksDB performance report tested an in-memory workload on a server with 144GB of RAM and 16 CPU cores. We finally rescued our 64 core server from VMWare hosting duty to run some scalability tests. This machine is a SuperMicro 2042-6RF with 4 16-core AMD Opteron 6274 CPUs and 256GB of DDR3-1333 ECC RDRAM. For this test we're using an XFS filesystem on top of LVM, on top of a Samsung 840 Pro 512GB SSD. The base OS is RHEL7 with a Linux 3.10.0-123.el7.x86_64 kernel. The software versions used in this test were generally the newest available as of 2014-08-30.

Test Overview

Using the server we generate a database with 100 million records. The records have 16 byte keys and 100 byte values so the resulting database should be about 11GB in size. After the data is loaded a "readwhilewriting" test is run multiple times in succession. The number of reader threads is set to 1, 2, 4, 8, 16, 32, and 64 threads for each successive run. (There is always only a single writer.) All of the threads operate on randomly selected records in the database. The writer performs updates to existing records; no records are added or deleted so the DB size should not change much during the test. The results are detailed in the following sections.

Loading the DB

Here are the stats collected from initially loading the DB.

Engine Load Time CPU DB Size Process Size Total Size Context Switches
Wall User Sys % KB KB KB Vol Invol
LevelDB 09:24.39 04:26.36 00:52.31 56 11702148 540488 12242636 204233 625
Basho 44:28.12 07:15.42 02:15.19 21 13128020 127916 13255936 85254 12304
BDB 52:32.77 11:51.71 02:59.45 28 38590332 21499600 60089932 1196561 3162
Hyper 09:12.10 05:15.21 01:11.65 70 11644136 725100 12369236 219303 2023
LMDB 01:05.73 00:45.22 00:20.43 99 12580708 12583076 12583076 9 192
RocksDB 10:53.73 02:39.76 00:46.74 31 11745460 624916 12370376 398909 325
RocksDBpfx 10:09.24 10:27.97 00:44.74 110 12207504 13456836 25664340 195089 1134
TokuDB 24:12.76 09:35.02 04:20.47 57 14991692 18798900 33790592 1091241 1951
WiredLSM 24:27.22 17:08.86 11:53.50 118 12607732 1156020 13763752 18163720 4903
WiredBtree 07:35.84 01:24.96 00:18.78 22 11909312 314008 12223320 7268 151

We're running RocksDB in both a basic configuration and again in the tuned configuration used in the original RocksDB benchmark report. The tuned run is denoted as "RocksDBpfx" in all of these results.

The "Wall" time is the total wall-clock time taken to run the loading process. Obviously shorter times are faster/better. The actual CPU time used is shown for both User mode and System mode. User mode represents time spent in actual application code; time spent in System mode shows operating system overhead where the OS must do something on behalf of the application, but not actual application work. In a pure RAM workload where no I/O occurs, ideally the computer should be spending 100% of its time in User mode, processing the actual work of the application.

The "CPU" column is the ratio of adding the User and System time together, then dividing by the Wall time, expressed as a percentage. This shows how much work of the DB load occurred in background threads. Ideally this value should be 100, all foreground and no background work. If the value is greater than 100 then a significant portion of work was done in the background. If the value is less than 100 then a significant portion of time was spent waiting for I/O. When a DB engine relies heavily on background processing to achieve its throughput, it will bog down more noticeably when the system gets busy. I.e., if the system is already busy doing work on behalf of users, there will not be any idle system resources available for background processing. LMDB is effectively ideal.

The "Context Switches" columns show the number of Voluntary and Involuntary context switches that occurred during the load. Voluntary context switches are those which occur when a program calls a function that can block - system calls, mutexes and other synchronization primitives, etc. Involuntary context switches occur e.g. when a CPU must handle an interrupt, or when the running thread's time slice has been fully consumed. LMDB performs no blocking calls and uses no background threads, so there are practically no context switches during its run. All of the other DBs have multiple orders of magnitude more context switches, yielding much lower overall efficiency. None of the other DBs can perform the initial load anywhere near as quickly as LMDB.

The DB Size column shows the size of the DB files on disk after the load completes. The Process Size column shows the maximum size of the benchmark process during the load. The Total Size column shows the total amount of RAM consumed during the load; it is usually the sum of the DB and Process sizes since the DB occupies space in the filesystem cache. For LMDB it is simply the Process size, since the memory used by the filesystem cache is included in the LMDB process.

Throughput

The results for running the actual readwhilewriting test with varying numbers of readers are shown here. Results are further broken down by DB engine in the following sections.

RocksDB in its minimally tuned configuration appears to sustain write load the best across various threading loads. HyperLevelDB performs extremely well when there's just a single reader, but as soon as another read thread is added its performance drops sharply.

LMDB's write performance is pretty good up to 16 readers. The decline at 32 and 64 was a bit surprising and bears further investigation.

LMDB is clearly the leader here, with read performance scaling linearly with the number of cores. RocksDB in its highly tuned configuration is almost as fast. Too bad that tuning RocksDB is so complex that even its own developers don't understand it. LMDB outperforms all of the rest, with no special tuning.


Software optimization is always about optimizing both time and space. Often we have a good idea of what speeds we require for a given task, but provisioning the size can be much trickier. For most of these DB engines, the amount of memory consumed varies wildly with the changing workload. For BDB and LMDB the size is essentially constant. Also, as noted in our original design paper, we targeted LMDB at consuming about one-third as much memory as BDB. In practice we usually come in at only one-fourth as much, as shown here.

For expedience, the test duration is set to 10 minutes for each run. The DB-specific results follow.

LevelDB

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:03.84 00:11:22.91 00:00:28.26 117 16248852 9284764 25533616 620540 1249 918 20088
2 10:24.56 00:21:06.89 00:03:29.16 236 16911164 18340088 35251252 897864 2785 25810 32517
4 10:38.81 00:34:47.48 00:05:20.86 377 16237996 16935340 33173336 3815398 1292 909 53910
8 10:42.39 00:59:46.72 00:17:35.20 722 16853784 18585360 35439144 28845839 2149 26788 85254
16 10:43.54 01:16:27.32 01:05:23.05 1322 16074148 17341680 33415828 75899788 2942 895 93759
32 11:30.14 01:14:05.42 03:32:15.19 2489 15489724 17612732 33102456 42881359 120634 9111 134491
64 10:41.06 01:25:48.10 07:20:21.10 4924 15405012 16990268 32395280 39372702 61040 13887 64188


In this graph, the Write Rate is on the left Y-axis and the Read Rate uses the right Y-axis. As is typical of LevelDB, the write speed is highly erratic. The read speed doesn't scale linearly with threads, and it's clear that above 32 threads, the locking overhead becomes so great that additional threads simply slow things down (negative scaling).

Other items to note - even though the test duration is only 10 minutes, the actual running time is longer. This shows that the engine spends a significant amount of time cleaning up at the end of a test. Long cleanup times are a significant vulnerability; in an emergency shutdown a DB with a long cleanup time is likely to become corrupted if it doesn't have sufficient time to finish processing.


In the ideal case, since each run has a given number of reader threads plus one writer thread, the CPU % column should read 200 / 300 / 500 / 900 / 1700 / 3300 / 6400. (In the last case, since we only have 64 cores, the writer has to fight for CPU access with the readers but in every other case all tasks should be able to fully utilize a CPU core to themselves.) The CPU utilization here is far below ideal. This shows both heavy locking overhead and significant I/O wait time.

Also in the ideal case, User CPU time should always be much greater than System time. Here we see the System time increase drastically at 16 threads, and becoming several times greater than User time at 64 threads. This also indicates heavy locking overhead.

Basho LevelDB

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:04.26 00:13:02.20 00:00:50.94 137 13085012 119212 13204224 4855337 2730 27152 142414
2 15:51.23 00:05:29.78 00:01:24.47 43 14006896 1035836 15042732 4654184 14146 27905 94
4 15:49.88 00:06:18.85 00:02:04.90 53 14110944 1965704 16076648 4718741 13304 28431 190
8 15:55.17 00:17:29.21 00:03:37.73 132 23496036 4399292 27895328 4716679 12295 28121 126877
16 15:44.48 00:22:13.60 00:06:51.74 184 23591320 8437516 32028836 5478414 17922 27390 116351
32 15:09.41 00:43:19.13 00:10:13.12 353 14448056 12363952 26812008 5701432 60864 30256 106381
64 15:29.80 03:26:13.51 00:12:50.38 1413 14246272 14032552 28278824 5042157 141044 25898 75883

Basho's fork of LevelDB shows extremely long cleanup times for these tests. That along with the extremely low CPU utilization indicates severe I/O waits in these tests. This is quite unusual considering that the writes are asynchronous and the 11GB database easily fits in the server's 256GB RAM.

Aside from the strange glitch in read performance at 2 and 4 threads, it shows a general decline in read performance with increasing number of readers. The write speed is relatively constant. Overall this DB is a poor choice for multi-threaded work.

Comparing the CPU usage to the throughput shows that additional threads just cause the DB to spin, wasting cycles for each thread.

BerkeleyDB

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:10.95 00:09:30.05 00:03:37.04 128 34931488 16073632 51005120 653643 2217 3550 50475
2 10:53.83 00:15:31.35 00:08:50.35 223 34930640 16085680 51016320 2735020 1857 3930 47219
4 10:49.24 00:26:38.88 00:15:38.22 390 34952156 16087608 51039764 22274852 2237 3754 73019
8 10:09.80 00:32:11.35 00:44:09.36 751 34931132 16081288 51012420 76970470 2126 2974 66445
16 10:11.06 00:25:39.69 02:16:41.90 1594 34934320 16071760 51006080 23633587 3319 2014 61253
32 10:16.70 00:23:14.77 05:00:23.84 3148 34945116 16065176 51010292 9084640 174998 1078 50525
64 10:06.52 00:35:26.97 09:48:26.09 6171 34934008 16060424 50994432 11150474 68473 449 34262

In OpenLDAP we have occasionally had reports of long shutdown times for slapd using BDB. We see here that in some cases there is a lengthy cleanup time, although nowhere near as bad as for BashoLevelDB.

BDB runs into lock scaling issues at 8 threads and things go downhill quickly from there.

The behavior is exactly the opposite of what is desired - the System CPU time increases with number of threads, instead of the User CPU time. Instead of getting useful application work done, it is just spinning in mutex overhead.

HyperLevelDB

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:17.20 00:14:46.85 00:01:47.94 161 15548040 14460184 30008224 432077 2602 63435 16741
2 10:05.32 00:18:43.46 00:02:55.43 214 18107372 16389472 34496844 3666264 2922 31637 20332
4 10:24.11 00:31:32.58 00:07:17.34 373 17498700 17424068 34922768 4160735 2862 35022 43237
8 10:28.28 00:53:27.79 00:12:49.46 633 16626288 17681188 34307476 15105478 2565 32361 70416
16 10:26.19 01:18:57.67 00:43:50.48 1176 16197856 16988044 33185900 59156005 4102 22540 126127
32 10:30.07 01:15:17.64 03:56:08.96 2965 15112364 15810176 30922540 27482892 184994 9967 162364
64 10:12.90 01:16:59.35 09:12:53.89 6166 15215340 15391140 30606480 7283399 82769 1753 114716

Aside from the sharp drop in write rate between the 1 and 2 reader case, not much stands out in these results.
HyperLevelDB runs into locking overhead problems at 32 threads, and additional threads beyond this point just slow things down.
Somewhere between 16 and 32 threads the System CPU overhead gets very high as well. Overall, increasing the number of readers causes a commensurate decrease in writer throughput.

LMDB

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:03.10 00:16:14.06 00:03:47.81 199 12587812 12590388 12590388 49 1537 48142 239740
2 10:02.00 00:23:38.66 00:06:21.47 299 12589356 12591940 12591940 1759 2600 37384 409245
4 10:01.56 00:32:08.06 00:17:48.70 498 12589572 12592184 12592184 140256 5774 30552 528349
8 10:01.77 01:01:26.58 00:27:56.55 891 12589572 12592228 12592228 2415620 9358 30775 1103808
16 10:01.82 02:11:52.53 00:37:24.96 1687 12589572 12592540 12592540 1499041 27988 29294 2615647
32 10:01.87 04:49:37.44 00:38:38.23 3272 12602972 12606132 12606132 910292 106739 23013 5080934
64 10:01.96 09:30:28.53 01:04:17.92 6326 12709900 12713708 12713708 206926 143083 13264 8192800

The CPU use here is fairly close to the ideal 200 / 300 / 500 / 900 / 1700 / 3300 / 6400.

The write rate from 4 to 16 threads is basically unchanged while the read rate increases. The drop in write rate at 64 threads is expected since the writer must compete with readers for CPU time at that point. It was not expected at 32 threads since there are still 31 idle cores on the system. More profiling will be needed to see what happened there.

The CPU usage is as expected; User CPU time increases with the number of reader threads and is proportional to the read throughput.

RocksDB

This is using RocksDB's default settings, except that the cache size was set to 32GB and the write buffer size was set to 256MB.
Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:19.66 00:15:05.49 00:01:18.68 158 13525496 382828 13908324 231034 1663 35255 253714
2 11:04.16 00:18:20.26 00:07:37.98 234 15214124 20828308 36042432 3974906 1591 30240 22713
4 11:22.96 00:28:47.09 00:13:54.57 375 15660148 23118260 38778408 17358093 1769 30018 28758
8 11:38.54 01:00:23.61 00:16:52.67 663 15855532 23491976 39347508 46657001 2740 29674 208460
16 11:08.11 01:35:56.77 00:34:25.12 1170 15785324 23766452 39551776 140127583 4163 29590 203410
32 11:21.72 01:50:40.89 01:36:10.79 1820 15907288 23357424 39264712 242502418 20404 28876 144861
64 11:39.90 02:09:59.91 04:13:57.65 3291 16196980 22632116 38829096 246604572 49088 26033 76008

As with Basho there's a strange drop in read throughput at 2 and 4 threads, though it's not quite as drastic here. CPU utilization overall is low, indicating a lot of I/O overhead. The runtime also indicates a significant amount of cleanup time at the end of each test.

While the write rate is fairly steady, reads start to drop off at 16 threads and adding more threads just slows things down from there.

System CPU time gets out of hand past 16 threads.

RocksDBpfx

This is using all of the tuning settings as published in the original RocksDB in-memory benchmark. As shown here, it takes over 40 settings to achieve best performance.
$TIME ./db_bench.rocks --num_levels=6 --key_size=16 --prefix_size=16 --keys_per_prefix=0 --block_size=4096 --cache_size=$CACHE --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=$STATS --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=$DUR --benchmarks=readwhilewriting --use_existing_db=1 --num=$NUM --threads=$THREADS --writes_per_second=$WRATE --duration=$DUR
Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 11:19.13 00:15:31.57 00:02:01.96 155 13958336 19184572 33142908 1530621 2341 29518 143077
2 11:42.98 00:23:55.84 00:03:47.22 236 14273048 20997544 35270592 1599541 3442 22199 262740
4 11:32.00 00:37:03.53 00:11:30.89 421 14102164 21605132 35707296 1209638 5375 25063 399082
8 13:00.94 01:14:49.76 00:14:12.12 684 13739348 21791576 35530924 1456330 11288 21828 845523
16 11:35.51 02:34:35.01 00:14:16.86 1456 14252980 21198204 35451184 1378890 64921 31422 2035618
32 11:33.73 05:09:52.76 00:17:54.52 2835 14239260 21515240 35754500 1690116 279869 24594 4333791
64 11:48.49 09:57:25.88 00:39:28.56 5393 14325896 21539212 35865108 527183 167314 24602 7818390

The various performance options definitely help, although they haven't eliminated the cleanup time at the end of the test. In some cases it's actually worse than for the default settings.

The read rate scales linearly with the number of threads, which is very good. The write rate is relatively unaffected by the number of readers, which is also very good.

The User CPU time is proportional to the read rate, and the System CPU time remains small. Again, these are very good results.

TokuDB

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:05.00 00:09:19.32 00:01:04.51 103 12377020 13867540 26244560 633989 1800 266 72934
2 10:08.83 00:14:47.05 00:05:33.55 200 12473960 14190944 26664904 1914432 1086 257 83785
4 10:11.02 00:26:19.24 00:10:43.91 363 12570772 14518040 27088812 31737888 957 247 119069
8 10:10.19 00:34:31.22 00:27:37.15 611 12700364 14750404 27450768 102760739 1359 237 123622
16 10:09.40 00:25:33.81 02:01:30.03 1447 12863188 15026356 27889544 51746085 4088 230 93177
32 10:10.02 00:21:13.04 04:43:10.17 2993 12941136 15099288 28040424 18567628 114165 203 67996
64 10:08.03 00:30:29.41 09:12:14.36 5750 12929692 15040012 27969704 18162776 99868 127 44633

We're told that something is misconfigured in our TokuDB driver, but unfortunately there is no documentation on what proper settings might be. As it is, the write rates here are ridiculously slow and the code is both I/O bound and lock bound.

Increasing threads beyond 8 just slows everything down.

The System CPU time gets out of hand above 8 threads. This indicates heavy locking overhead.

WiredTiger LSM

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys Vol Invol
1 10:18.83 00:20:27.41 00:03:32.60 232 16856992 13225240 30082232 220984 1910 697 35661
2 10:04.80 00:28:55.59 00:01:17.79 299 12201732 13170140 25371872 114333 2356 1748 83702
4 10:04.58 00:47:38.72 00:03:08.41 504 12278616 13359204 25637820 296681 4239 2202 151433
8 10:10.88 01:25:30.81 00:05:22.78 892 12471700 13698460 26170160 953444 16179 2154 284496
16 10:04.80 02:41:45.85 00:08:18.44 1687 12287368 14277704 26565072 2197247 207605 1789 404349
32 10:04.99 05:13:24.77 00:10:58.64 3217 12376328 13361324 25737652 3657026 310230 1237 328547
64 10:05.17 07:55:52.22 02:24:17.31 6148 12359716 13345940 25705656 4547381 61186756 817 294251

Reads scale fairly well up to 16 threads. The write rates are quite slow; it's unusual to see the write rate ramp up before ramping down again as it does here.

Comparing throughput to CPU use implies that this library uses its own locking mechanism, not just the system's mutexes.
The increase in System CPU at 64 threads indicates heavy mutex overhead.

WiredTiger Btree

Threads Run Time CPU % DB Size Process Size Total Size Context Switches Write Rate Read Rate
Wall User Sys KB KB Vol Invol
1 15:30.64 00:14:01.12 00:07:22.39 137 23818836 18416052 42234888 774344 1864 30223 93085
2 13:25.07 00:21:47.42 00:09:06.81 230 23818844 18889116 42707960 1760746 1976 31933 213321
4 12:30.88 00:30:46.97 00:18:00.85 389 23818844 18657408 42476252 4576807 1613 26682 300084
8 12:22.53 01:03:23.17 00:21:17.12 684 23818844 18597748 42416592 7664991 5899 24349 549349
16 12:31.51 02:11:38.13 00:23:29.90 1238 23818844 18415612 42234456 11727794 28631 20807 842162
32 12:23.78 03:25:13.55 01:20:51.76 2307 23818836 17554408 41373244 19375216 230283 9809 668958
64 12:29.11 05:42:55.20 02:33:29.72 3976 23818836 17024528 40843364 29848253 301760 3967 641151

This DB shows significant cleanup time at the end of each test. Also it appears to have heavy lock overhead from 4 threads on up, based on the low CPU usage.

The read rate scales well up to about 8 threads, then tapers off. Additional threads past 16 just slow things down.

The System CPU time is noticeable at all load levels, but gets out of hand past 16 threads.

NUMA Considerations

The amount of System CPU time for LMDB in these tests seemed excessively high. Using oprofile showed that most of the System time was due to reshuffling of memory pages between different NUMA nodes. This particular machine has 8 nodes, or 8 different memory zones. 8 CPU cores reside in each node. While the entire database is small enough to fit into a single node, and will generally be allocated that way by the initial loading phase, that means that 56 of the cores will be at a significant disadvantage when accessing the database because all of their references will be to a remote node. Further testing was done using NUMA interleaving to measure its performance impact - see the Scaling/NUMA writeup for the results.

Conclusion

Just because your database workload is in-memory doesn't mean it won't still run into bottlenecks and unexpected slowdowns. And just because your DB appears to run well when using a few CPUs is no guarantee that it will continue to run well when you scale up to a larger, more powerful system.

The definition of "what fits in memory" also varies wildly depending on whose DB you're talking about, what kind of workload you throw at it, and how you tune the DB engine. One of the reasons for using an embedded DB is not only that it has a small footprint in terms of code and resource usage, but also in terms of administration and maintenance. Some of these DBs have quite complex configurations that must be carefully tuned in order to obtain usable performance; they seem to have lost the point of simplicity in their quest for performance. With LMDB no tuning is required. If you're looking to jump on the In-Memory bandwagon, there's only one viable choice.

Files

The files used to perform these tests are all available for download. Command script (100M), raw output (100M), binaries. The results tabulated in an LibreOffice spreadsheet are also available here. The source code for the benchmark drivers is all on GitHub. We invite you to run these tests yourself and report your results back to us.

The software versions we used:

Software revisions used:

violino:/home/software/leveldb> g++ --version
g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

violino:/home/software/leveldb> git log -1 --pretty=format:"%H %ci" master
e353fbc7ea81f12a5694991b708f8f45343594b1 2014-05-01 13:44:03 -0700

violino:/home/software/basho_leveldb> git log -1 --pretty=format:"%H %ci" develop
d1a95db0418d4e17223504849b9823bba160dfaa 2014-08-21 15:41:50 -0400

violino:/home/software/db-5.3.21> ls -l README 
-rw-r--r-- 1 hyc hyc 234 May 11  2012 README

violino:/home/software/HyperLevelDB> git log -1 --pretty=format:"%H %ci" master
02ad33ccecc762fc611cc47b26a51bf8e023b92e 2014-08-20 16:44:03 -0400

violino:~/OD/mdb> git log -1 --pretty=format:"%H %ci"
a054a194e8a0aadfac138fa441c8f67f5d7caa35 2014-08-24 21:18:03 +0100

violino:/home/software/rocksdb> git log -1 --pretty=format:"%H %ci"
7e9f28cb232248b58f22545733169137a907a97f 2014-08-29 21:21:49 -0700

violino:/home/software/ft-index> git log -1 --pretty=format:"%H %ci" master
f17aaee73d14948962cc5dea7713d95800399e65 2014-08-30 06:35:59 -0400

violino:/home/software/wiredtiger> git log -1 --pretty=format:"%H %ci"
1831ce607baf61939ddede382ee27e193fa1bbef 2014-08-14 12:31:38 +1000
All of the engines were built with compression disabled; compression was not used in the RocksDB test either. Some of these engines recommend/require use of a non-standard malloc library like Google tcmalloc or jemalloc. To ensure as uniform a test as possible, all of the engines in this test were built to use the standard libc malloc.