Terracotta has announced the availability of BigMemory, which provides a large offheap cache through their Ehcache project. It is designed to avoid the GC impact caused by massive heaps in Java, at a license cost of $500 per GB per year, if I have my figures right.
The Reason We’re Here
First, let’s understand the reason BigMemory exists at all: the nature of the JVM heap, the standard (“Sun”) JVM memory model.
In very basic summary, there are two generations of objects in the Sun JVM: a young generation and an old generation. Simply put, garbage collection occurs when a generation fills up.
A young generation’s garbage collection is normally very quick; the young generation space isn’t usually very large. However, the old generation’s garbage collection phase can be very long, depending on a lot of factors – the simplest of which is the size of the old generation. (The larger the old generation is, the longer garbage collection will take… more or less.)
The problem addressed by BigMemory is the existence of a large old generation (let’s say, larger than ten gigabytes). When you have a lot of read-only data (as in, a side-cache like Ehcache?), and an old-generation GC occurs, the garbage collector potentially has to walk through a lot of data that isn’t eligible for collection; that takes time, and slows the application down. Some operations cause it to block the application.
Let’s all agree: blocking the application is usually unpleasant.
Is BigMemory a Good Thing?
From the standpoint of garbage collection being bad, when used with large heaps: BigMemory looks like a huge win! No more GC associated with giant cache regions!
That’s not all there is to the situation. Let’s think about this.
You’re talking about optimization of a cache, a cache built around key/value pairs – a map with benefits, more or less. (Namely, expiration of data, and a few other things that we’ll discuss soon.)
Cache isn’t usually the main problem.
Cache is usually not the primary problem for an application; cache is a way of hiding how expensive data access has become, in most cases. It’s a symptom of your database being too slow, for whatever reason, and by whatever definition. (I’m ignoring memoization and computational results, which usually don’t end up being gigabytes’ worth of data anyway.)
Cache moves the problem around: in BigMemory’s case, the problem has gone from “slow data access” to “lots of data causes our app to slow down unacceptably.” There are still costs: serialization still takes time, key management is a problem, and merely accessing the data can be slow.
Key Management
Key management is a factor because you still have to know how to access your data. If you know the object’s primary key, of course, that’s an easy reference to use, but what if you don’t know the object’s primary key? (The object’s lost, that’s what.) A side cache is ideal for working with data that your application is well aware of, but not for data that your application has to derive.
Consider memcached; there are good books on memcached that actually suggest using a SQL query for a cache key! The key, in some cases, is larger than the data item generated by the query. Ehcache isn’t going to have a different solution.
Serialization and Offheap Access
Serialization factors in because of the way BigMemory works.
BigMemory allocates memory directly in the OS, through an NIO ByteBuffer, and manages references itself. That part is good, but there are two issues here: serialization and access time.
Serialization factors in whenever the OS’ memory is used: since the memory region is a set of bytes, Java has to serialize the cached objects into that memory region on demand. Serialization is not the fastest mechanism Java has – looking at the timings of remote calls in Java, serialization is more expensive than the network call itself is. Your offheap writes and reads are going to be slow, period, just through serialization.
Plus, accessing the byte buffer itself is slow (because accessing offheap memory is slow.) This is important, but less than it could be, because BigMemory identifies a “hot set” of data – they say 10% of the stored data – and keep it “on-heap” for rapid access (which presumably avoids serialization, too.)
This is a good thing, sort of – but it also has an implication for your heap size. I don’t know offhand how hard of a limit that hot set percentage is, but if it’s something that’s not adjustable, your heap size will always have to be able to handle the hot set’s size – let’s say 10% of the size of the offheap storage as a rough estimate.
This establishes a limit on the offheap size, too, because a JVM heap that’s too large (to satisfy the needs of the BigMemory offheap cache) would suffer the very same problems BigMemory is trying to help you avoid.
What about the GC time itself?
BigMemory doesn’t actually get rid of GC pauses. It only removes the need to garbage-collect the offheap data (again, roughly 90% of the data lives offheap, as a “cool set” compared to the 10% “hot set.”) Even Terracotta’s documentation shows GC pauses, although they haven’t demonstrated the tuning associated with the pauses.
Actual Impact of BigMemory
If your application is sensitive to any pauses at all, BigMemory’s going to… help a little, but not that much, because pauses will still exist. They’ll have less impact, and your SLA factors in very heavily here, but they’ll still be there, caused by key management at the very least.
The only way to fix those pauses is to fix the application, really. JVM tuning can help, but realistically, an application that needs a giant cache like that has been built the wrong way; you’re far, far, FAR better off localizing smaller slices of your data into separate JVMs, which can communicate via IPC, than you are by pretending a giant heap will make your troubles go away.
So should you buy BigMemory? – A comparison with other mechanisms.
Well, I’d say no, with all due respect to our friends at Terracotta. I actually took their tests and ran comparisons against the JVM’s poor ConcurrentHashMap, and found some really interesting numbers:
ConcurrentHashMap won, by a lot, even in a test calculated to abuse the cache, and we’re still factoring in garbage collection.
The problem statement looked something like this: can we run a large, simple key/value store, consisting of a large amount of data, and avoid garbage collection pauses of over a second? (Remember, this is the statement they’re using to justify the development of BigMemory in the first place.)
The answer is: yes, although the solution takes different shapes depending on what the sensitive aspects of the application are.
For example, on our test system, with a 90 GB heap and fifty threads, 50% read/write operations, we had an average latency of 184 uSec, with some outliers. The outliers cause our hypothesis to fail, however. (This addresses the actual performance of the cache, though: 50 threads accessing a single map, which … isn’t very kind.)
Further, the most important factor in accessing our map isn’t the size of the map or the GC involved, but the number of threads in the JVM. If we use the same 90GB heap, giant ConcurrentHashMap, twenty-five threads – our normal latency time drops to around 80 usec, again with outliers in the Sun JVM.
And total throughput? I’m sorry, but BigMemory fails here, too. A BigMemory proponent mentions having 200000 transactions per second on a 100GB heap. We were looking at the millions with ConcurrentHashMap – even with the latency impact of thread synchronization. With a more normal cache usage scenario – closer to a 90% read/10% write situation – the numbers for the stock JVM collections climb even more.
Back to latency: if we do something drastic – like, oh, use a higher-performance JVM like JRockit, even without deterministic GC – then we get similar latency numbers, except the time spent in GC disappears, to something like the 400msec range. Plus, our hypothesis passes muster, literally – giant heap, no GC pauses of over a second, no investment in anything (besides JRockit itself, of course, which Oracle suggests will be merged into OpenJDK.)
Now, here’s the summary: We took Terracotta’s test, factored out BigMemory, and got the same results or better, without rearchitecting anything at all.
If you rearchitect to distribute the “application” properly, you could get even better access times, and even larger data sets, with less discernable GC impact.
That’s power – and, sadly, it doesn’t make BigMemory look like a big splash at all.

{ 13 comments… read them below or add one }
I wonder if BigMemory even makes sense within terracottas own client/server framework, given these numbers.
I sort of doubt it.
What if they try to use custom serialization scheme with BCI/JNI? This can increase throuhput serveral times and reduce ‘hot set’ size. But the real problem is sounds different: JVM memory model doesn’t scales. And the only way to use huge amount of fast data in JVM are off-heap solutions. I think BigMemory’s way is right solution to the problem – from the technical point of view.
The problem is not that the JVM memory model doesn’t scale.
The problem is that a JVM model, used in specific ways, doesn’t scale. I ran tests (the terracotta tests, actually) that DID scale, without their solution. That tells me that the solution is optional.
There is no infinitly scalable memory model due to physical limitations. And from this point of view your solution is the only way to achive the fastest results.
But features come with cost. What if one whants to persist his cache? Off-heap cache can be persisted much faster because the data is already serialized. Fast cache dumps comes with cost of slow access. Again, transactional memory is slowly than the regular one.
It seems to me that BigMemory pretends to be a generic solution. It is slowly for particular task and your tests show this fact. But it allows more freedom for memory model implementation. For example using succinct data coding schemes allows much more information to be stored in a unit of memory compared to traditional ones based on linked lists. Exploiting short data paths and reference locality of such coding schemes can significantly improve performance under massive load.
If BM’s the developers are consistent in their decisions than BM eventually become a some kind of no-sql java database optimized for big memory. And I think that this is a trend. If application logic is in JVM than it is nesessary to keep datapaths short to acheive fastest possible results. Does it leave a room for good old native RDBMS?
BigMemory is overpriced. This is it’s most serious drawback.
//Sorry for many words…
Well… persisting a cache has multiple aspects to it. Are you persisting synchronously? With what granularity?
The problem isn’t bigmemory, per se: it’s that using a side cache itself is an architectural design flaw.
The only way to get around this is to not use a cache as a cache, but as a data store (which is going to a NoSQL-type solution.)
Here’s the thing: NoSQL can scale. Very well, in fact, and the result *is* a memory architecture with few limits. (I don’t want to say “infinite” any more than i want to say “never.”)
As usual, the keys are what you’re willing to put up with, and when. My tests showed an ability to keep up with BigMemory’s “gains” while not losing much – but I didn’t test persistence, nor object expiry.
Persistence is a problem; as soon as you start talking persistence, you start to question whether the cache is a cache at all (it becomes a datastore at that point, whether primary or secondary.) if it’s a primary datastore *at all*, you introduce transactions, which completely wreck synchronous performance.
That’s where other products that aren’t glorified caches step up, IMO (note: bias!)
I think that if a cache is transactional, replicated and distributed (like TreeCache is) than it is not a cache. The main purpose of cache is to keep datapaths as small as possible for some ‘hot set’. We can’t make fast cache big. The bigger cache is the closer it’s features to DBMSs are. How long it takes to fill a 100Gb cache having slow random access storage?
From this point of view the ConcurrentHashMap is a best solution for in-process shared *cache*. If application requres more complex solution than it is better to call it ‘storage’ not ‘cache’. Because, as you pointed out, there is no clear difference between such caches and full-featured databases. And is is better to extends existing databases to fulfill these requirements than to develop fully custom solution.
We evaluated Terracota for distributed caching about a year ago. Scalability was terrible. These guys sales hopes, not solutions. Unfortunately, many developers are guided by buzzwords and brends but not by reasoning.
One more point should be considered before standard Java containers (HashMap, TreeMap, CHM, …) are used for large data sets. These containers are not fully exception-safe. If OOM happens when RB-tree or hash table is rebalanced/rebuild then the whole container become invalid and must be recreated. Unfortunatly OOM happens even if the heap has much amount of free space – due to memory fragmentation. It is very likely that JVM is unable to allocate big array when a big hash table is resized.
For small data sets the container rebuild is not a problem. But for lage multigigs data sets special OOM-tolerant data structures must be used. Obviously they will be much slowly then generic ones. IMHO something like BigMemory is a destiny.
Sorry, man. I do not buy your numbers and your arguments. Running 90G java heap w/o any significant GC overhead at all? This is hardly to believe (just do simple math – how many seconds will it tales to compact even 1/10th of this heap). This is first.
Second, there are a lot of application which can tolerate 200-300ms GC pauses but :
1. Can not tolerate 500ms – any SIP servers, for example
2. Can not tolerate 1-10s pauses – any distributed application with periodic “keep alive” master-node notifications
3. Can not tolerate 10-100 sec – JobTracker/ TaskTrackers in Hadoop MapReduce.
I have observed many times GC pauses in seconds even for 2-4G heaps.
Offloading a lot of mostly static data to off heap is not that bad idea.
Fair enough, Mr. Rodionov. I’d like to point out a few things, though.
My point was that on a standard JVM, I was able to mitigate most of the effects of garbage collection – with little effort. The amount of mitigation depended on the effort and a few other variables.
Second, I did not say ‘never use this.’ If it fits your need, use it. My point there was that you need it less often than the authors implied, and that I tested that statement.
Third, I work for a distributed vendor – if you really want to talk about mitigating garbage collection entirely, I can do that. I ran one sequence of tests that had *no* garbage collection impact. At all. For a 90GB heap. As in: no latency was introduced, over an hours-long test (a test that fired off garbage collection events for the ‘standard app configuration.’)
I’m aware of the impact of garbage collection; I wasn’t making idle claims.
Mr. dreamreal
Java non-static cache in a 90G heap w/o enormous GC overhead and frequent stop-the-world pauses is a non-science fiction.
Sorry, Mr. Rodionov. My tests show otherwise.
You are dilettante, Mr. dreamreal.
You have a right to your opinion. Again, I disagree; I’ve been in this field for close to thirty years now, as a hobbyist and a professional, and my industry exposure has been fairly deep; I’ve been using Java since around 1998 or so, and my employment history hasn’t been inconsequential, in my opinion. That’s fine; I’m not expecting you to take anything without consideration, and if you decide I’m incorrect about something, that’s your choice.
{ 2 trackbacks }