Monday, February 09, 2009

Give me the Cache

Web scale development is cache development. Oversimplification? Yes, but it is still true. Figuring out what to cache, how to cache it, and (most importantly) how to manage that cache is one the more difficult things about web application development on a large scale. Most other things have been solved before, but chances are that your caching needs are quite unique to your application. You are probably not going to be able to use Google to answer this kind of design question. Unless of course your application is extremely similar to the one I am about to describe.

I had an application that performs a lot of reads, in the neighborhood of 1000 per second. The application's data needed to be modified very infrequently, less than 100 times per day. Ah, perfect caching! Indeed. In fact the cache could even be very simply: a list of value objects. Not only that, but it's a very small list. It gets even better, the application has a pretty high tolerance for cache staleness. So it's ok if it takes a few minutes for an update to show up. A few different caching strategies emerged out of this too-good-to-be-true scenario. So it was time to write some code and do some testing on these strategies. First, here is an interface for the cache.

public interface EntityCache {
List<Entity> getData();
void setData(Collection<? extends Entity> data);
}

Again, the Entity class is just a value object (Java bean) with a bunch of fields. So first, let's look at the most naive approach to this cache, a class I called BuggyCache.

public class BuggyCache implements EntityCache{
private List<Entity> data;

public List<Entity> getData() {
return data;
}

public void setData(Collection<? extends Entity> data) {
this.data = new ArrayList<Entity>(data);
}
}
So why did I call this buggy? It's not thread safe. Just imagine what happens if Thread A calls getData(), and then is in the middle of iterating over the data, and Thread B calls setData. In this application, I could guarantee that the readers and writers would be from different threads, so it was just a matter of time before the above scenario would happen. Hello race condition. Ok, so here was a much nicer thread safe approach.

public class SafeCache implements EntityCache{
private List<Entity> data = new ArrayList<Entity>();
private ReadWriteLock lock = new ReentrantReadWriteLock();
public List<Entity> getData() {
lock.readLock().lock();
List<Entity> copy = new ArrayList<Entity>(data);
lock.readLock().unlock();
return copy;
}

public void setData(Collection<? extends Entity> data) {
lock.writeLock().lock();
this.data = new ArrayList<Entity>(data);
lock.writeLock().unlock();
}
}

This makes use of Java 5's ReadWriteLock. In fact, it's a perfect use case for it. Multiple readers can get the read lock, with no contention. However, once a writer gets the write lock, everybody is blocked until the writer is done. Next I wrote a little class to compare the performance of the Safe and Buggy caches.

public class Racer {
static boolean GO = true;
static int NUM_THREADS = 1000;
static int DATA_SET_SIZE = 20;
static long THREE_MINUTES = 3*60*1000L;
static long THIRTY_SECONDS = 30*1000L;
static long TEN_SECONDS = 10*1000L;
static long HALF_SECOND = 500L;

public static void main(String[] args) throws InterruptedException {
final AtomicInteger updateCount = new AtomicInteger(0);
final AtomicInteger readCount = new AtomicInteger(0);
final EntityCache cache = new SafeCache();
long startTime = System.currentTimeMillis();
long stopTime = startTime + THREE_MINUTES;
Thread updater = new Thread(new Runnable(){
public void run() {
while (GO){
int batchNum = updateCount.getAndIncrement();
List<Entity> data = new ArrayList<Entity>(DATA_SET_SIZE);
for (int i=0;i<DATA_SET_SIZE;i++){
Entity e = Entity.random();
e.setId(batchNum);
data.add(e);
}
cache.setData(data);
try {
Thread.sleep(THIRTY_SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
updater.start();
Thread.sleep(TEN_SECONDS);


List<Thread> readers = new ArrayList<Thread>(NUM_THREADS);
for (int i=0;i< NUM_THREADS;i++){
Thread reader = new Thread(new Runnable(){
public void run() {
while (GO){
int readNum = readCount.getAndIncrement();
List<Entity> data = cache.getData();
assert(data.size() == DATA_SET_SIZE);
for (Entity e : data){
System.out.println(Thread.currentThread().getName() +
"Read #" + readNum + " data=" + e.toString());
}
try {
Thread.sleep(HALF_SECOND);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
readers.add(reader);
reader.start();
}

while (System.currentTimeMillis() < stopTime){
Thread.sleep(TEN_SECONDS);
}
GO = false;
for (Thread t : readers){
t.join();
}
long duration = System.currentTimeMillis() - startTime;
updater.join();
int updates = updateCount.get();
int reads = readCount.get();
System.out.println("Duration=" + duration);
System.out.println("Number of updates=" + updates);
System.out.println("Number of reads=" + reads);
System.exit(0);
}
}
This class creates a writer thread. This thread updates the cache every 30 seconds. It then creates 1000 reader threads. These read the cache, iterate and print its data and pause for half a second. This goes on for some period of time (3 minutes above), and the number of reads and writes is recorded.

Testing the BuggyCache vs. the SafeCache, I saw a 3-7% drop in throughput from using the SafeCache. Actually this was somewhat proportional to the size of the data (the DATA_SET_SIZE variable.) If you made it bigger, you saw a bigger hit as the reading/writing took longer and there was more contention.

So this seemed pretty acceptable to me. In this situation, even a 7% performance hit for the sake of correctness was worth it. However, another approach to this problem came to mind. I like to call it a watermark pattern, but I called the cache LeakyCache. Take a look to see why.

public class LeakyCache implements EntityCache{
private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
private final AtomicInteger index = new AtomicInteger(0);

public List<Entity> getData() {
List<Entity> current = data.subList(index.get(), data.size());
return current;
}

public void setData(Collection<? extends Entity> newData) {
int oldSize = this.data.size();
this.data.addAll(newData);
if (oldSize > 0){
index.set(oldSize + newData.size());
}
}
}
The idea here is to keep one ever growing list for the cache and index (or watermark) to know where the current cache starts. Every time you "replace" the cache, you simply add to the end of the list and adjust the watermark. When you read from the cache, you simply copy from the watermark on. I used an AtomicInteger for the index. I probably did not need to, and a primitive int would have been good enough. I used a CopyOnWriteArray for the cache's list. You definitely need this. Without it you will wind up with ConcurrentModificationExceptions when you start mutating the cache with one thread, while another thread is iterating over it.

So you probably see why this is called LeakyCache. That internal list will grow forever. Well at least until it eats all of your heap. So that's bad. It also seems a bit more complicated than the other caches. However, it is thread safe and its performance is fantastic. How good? Even better than the BuggyCache, actually 3x as good as the BuggyCache. That deserves some qualification. Its througput was consistently more than 3x the throughput of the other caches, but I didn't run any long tests on it. It would eventually suffer from more frequent garbage collection as it leaks memory. However, if your updating is not too frequent, the entities are relatively small, and you've got lots of memory, then maybe you don't care and can just recycle your app once a month or so?

Maybe you aren't satisfied with that last statement. You're going to force me to fix that memory leak, I knew it. Here is a modified version that does just that.

public class LeakyCache implements EntityCache{
private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
private final AtomicInteger index = new AtomicInteger(0);
private final ReadWriteLock lock = new ReentrantReadWriteLock();

public LeakyCache(){
Thread cleaner = new Thread(new Runnable(){
public void run() {
while(true){
try {
Thread.sleep(60*60*1000L);
} catch (InterruptedException e) {
e.printStackTrace();
}
lock.writeLock().lock();
if (data.size() > 500000){
List<Entity> current = new ArrayList<Entity>(getData());
index.set(0);
data.clear();
data.addAll(current);
}
lock.writeLock().unlock();
}
}
});
cleaner.start();
}
public List<Entity> getData() {
lock.readLock().lock();
List<Entity> current = data.subList(index.get(), data.size());
lock.readLock().unlock();
return current;
}

public void setData(Collection<? extends Entity> newData) {
lock.readLock().lock();
int oldSize = this.data.size();
this.data.addAll(newData);
if (oldSize > 0){
index.set(oldSize + newData.size());
}
lock.readLock().unlock();
}
}
So what does this do? In the constructor we spawn Yet Another Thread. This one periodically (once an hour in the example) checks to see how big the cache is. If it is over some limit, it gets the current data, clears the cache, adds the current data back to it, and reset the watermark to 0. It also Stop The World to do this by once again using a ReentrantReadWriteLock. Notice how I have abused the lock by using the read lock for both getting and setting the cache. Why use it for setting? The cleaner thread gets exclusive access to the write lock. It uses it when it is cleaning up. By having the setData method use the read lock, it will be blocked if the cleaner thread is in the middle of a cleanup.

Adding this extra bit of complexity fixes the memory leak, while maintaining thread safety. What about performance? Well the performance is highly configurable depending on how often the cleaner thread runs (well how long it sleeps really) and how big you are willing to let the cache grow before cleaning it up. I put it on same very aggressive settings, and it caused about a 15% hit to the leaky version. The performance is still much better than any of the other versions of the cache.

Next up ... write the cache in Scala using Actors.

5 comments:

Jorge Ortiz said...

I think the original LeakyCache is still buggy.

Consider a thread trying to set the data and a thread trying to get the data. The setting thread gets switched in and runs up to line 12 (it adds the new data, but doesn't increment the index) before it gets switched out. At this point the reading thread gets switched in and executes the entirety of getData. The reading thread will get a List with two copies of the Entity data, the old copy and the new copy, because the index hasn't been updated yet.

Also, I think your BuggyCache isn't actually buggy. Reference assignments are guaranteed to be atomic, and you make sure to make a new ArrayList on every set. The set will just change the reference to point to the new ArrayList, but it won't affect the old ArrayList at all.

Jorge Ortiz said...

It's been pointed out to me that non-synchronized assignment to non-volatile variables might never appear in other threads. So I guess BuggyCache is buggy unless data is marked volatile.

(Also, you have no guarantees that the users of your cache won't modify your (mutable) List, which could also introduce all sorts of bugs.)

Michael Galpin said...

Great analysis Jorge. I simplified the code for the sake of the blog, and clearly was guilty of oversimplification in both cases.

You remarks about volatile are very interesting. Normally I've used volatile to guarantee immediate visibility of a change to a value, and used this as a replacement for synchronization. In the case of cache that can accept latency, i.e. readers don't have to immediately get the new version of the cache when it is updated, do you still need volatile or do you risk that the readers never get the update? I wonder if this is a case where the spec allows more flexibility than any implementation would dare to take?

Alex Miller said...

As Jorge said, if you make the BuggyCache data volatile, you satisfy atomicity and visibility constraints. Of course, you must make the assumption that no one else is mutating the list.

I also agree with Jorge that the LeakyCache is broken. It's related to this problem (but over two variables instead of two actions on the same variable):

http://tech.puredanger.com/2009/01/30/java-concurrency-bugs-atomic/

I also wanted to mention that when using RRWL, you should always lock(), then do your work in a try { } finally { unlock() }. Otherwise, an exception will cause your lock to be locked forever.

Jorge Ortiz said...

Michael:

It appears that if the field is not synchronized and non-volatile, the memory model makes no visibility guarantees between threads. In practice, the situation is probably not that grim.

If you don't want to incur the performance penalties of synchronized or volatile, you could probably keep a ThreadLocal copy of the cache. You can periodically check it for "staleness", at which point you update it from the synchronized or volatile copy.