Memcache Benchmark

Symas Corp., May 2013


This page shows performance of three persistent storage engines for the Memcache protocol, compared against the original Memcached. The numbers for Memcached 1.2.0, MemcacheDB with BerkeleyDB 4.7.25, and MemcacheDB with LMDB were measured September 2012 and originally reported on the Memcached Google group. The numbers for MySQL's InnoDB Memcached plugin were measured in May 2013 using MySQL 5.6.11 installed from the .deb package obtained from the MySQL download site. New results for BerkeleyDB 5.3.21 and current LMDB were also measured in May 2013.

Background

It's well known that the SQL language itself imposes a great deal of overhead on data accesses. One example shows 85% of MySQL runtime consumed in parsing, with only 15% spent in the actual storage engine. Since we are interested in developing an LMDB backend for MariaDB it's important that we be able to characterize the potential speedup such a backend might offer. Also, since MySQL now offers this Memcache interface to provide more direct access to the storage engine, bypassing the SQL processing, we can get a more apples-to-apples comparison with MySQL's underlying storage engine performance.

The only tuning change made to the default configuration was to set daemon_memcached_w_batch_size to 32, which was its old default value. Using its current default value of 1, which commits a transaction after every memcache PUT operation, was so slow it would take many hours to run once through the test.

As such this isn't quite an equal comparison; both BDB and LMDB are also fully transactional storage engines and both were committing immediately on every PUT. However, both were run with the -N (nosync) option, and flushes/syncs were performed by a background checkpoint thread that triggered every 5 minutes. For BDB the activity of the checkpoint thread caused extreme slowdowns of the test, on the order of many seconds. These delays are excluded from the results shown here.

Also the default InnoDB memcache table had to be altered to support VARCHAR(4096) values; it was originally set for only VARCHAR(1024) and was returning truncated results in the initial test runs.

Results

Using memcachetest we do a simple run with 100000 total operations. 1/3rd of the operations are PUTs, 2/3rd are GETs. The test is run twice against each engine, once using a single thread and again using 4 client threads. All are run on the same machine, which has 16 cores and 128GB of RAM (thus CPU or memory exhaustion is not an issue).

Single Thread


The graphs show milliseconds per operation, summarized as minimum, average, and maximum times to various percentiles, in logarithmic scale. It's plain that, as claimed, LMDB's performance is comparable to the pure-memory memcached, while both BerkeleyDB and InnoDB are both much slower. On average, InnoDB reads are 2 orders of magnitude slower. The extremely large maximum times for BerkeleyDB and InnoDB stand in stark contrast to the tightly defined LMDB and Memcached results. The extremely wide range of response times for InnoDB translates to unpredictable response times for clients.

Four Threads


With multi-threading added to the mix, the picture gets much worse for all the engines besides LMDB. Since LMDB reads run lockless, reads scale perfectly linearly across arbitrarily many CPUs, and since readers and writers don't block each other, write performance is also largely unaffected by increasing load.

Note that LMDB even outperforms the pure-memory Memcached here, because Memcached's main-memory tables require locking for concurrency control. This illustrates the folly of using so-called "main-memory data structures" instead of B+trees, even when the data will never touch a disk.

The raw test data is available in this OpenOffice spreadsheet.

Additional Data

The test with only 100,000 operations is rather short. Raw data for a retest using 10,000,000 operations is available in raw10M.txt. Also note that in this retest, these InnoDB tuning parameters were added:
	innodb_buffer_pool_size=512M
	innodb_flush_log_at_trx_commit=0
	innodb_support_xa=0
This is more comparable to the nosync mode the BDB and LMDB tests used. The buffer pool default was 128M; testing with the 512M setting showed that no more than 192M was actually used. The output shows what appears to be some errors in the InnoDB run, but there's no log info to describe what failed.

Conclusion

As so many others have written, speed and latency matter. Amazon found that a 100msec delay resulted in a 1% reduction in sales. Google found that a 500msec delay resulted in 20% fewer page views. If you're using slow data engines to power your web apps, you're losing business. Symas' Lightning MDB can help you recapture that business.