Showing posts with label mysql. Show all posts
Showing posts with label mysql. Show all posts

Tuesday, August 19, 2008

Cache Discussions

My post about using MySQL for caching got picked up by reddit and viewed a few thousand times. It sparked some discussion, but unfortunately it has been spread out on a few different sites. So I decided to aggregate them here.

"Here's a big reason to use MemCached: expiry!
Let's say you only want to do a complicated query once every fifteen minutes. Do it once, put it in a cache by key with an expiry of 15 minutes. Let memcached worry about when to take it out for you."

Yes, this is a good reason to use memcached. I used this pattern for the aggroGator, with Google's version of memcached. Which reminds me that Part 3 of the series I wrote on GAE is out... Anyways, in that app, RSS feed results are cached in memcached with a five minute poll (initiated client-side, so only polling for logged in users.)

Expiry is the cache eviction policy for memcached, where as any database cache is going to be more of an LRU policy. There will be cases where expiry is more useful, but I would actually guess that LRU is appropriate for the majority of use cases...

"MySQL memory table is not as fast as memcached. Depending on your data, memcached is 3X times or more as fast for get/set (select/insert)."

Really? I would love to see some objective results for this. Of course it would have to be an apples-to-apples comparison. The data would need to be retrieved from a cache node on separate physical machine and for the MySQL cache, it would need to be a select by primary key. Now I wouldn't be terribly surprised if memcached was slightly faster, but 3X? I would be even more surprised if a put/insert was faster at all.

"This has been said many times. MySQL and memcached serve different purposes. Memcached is used to store processed data, while MySQL generally contains raw, normailized data, which needs lots of complex queries and other processing."

I actually mentioned this at the end of my post... So obviously I agree. But I have a feeling that people use memcached to cache a lot of data that is not very processed at all. Also, the last line is very misleading. You do not need to do much normalizing of your data. I can tell you that anybody doing federated database systems have to do a lot of de-normalizing of their data. And complex queries and other processing? That is just silly.

"Facebook needs memcache for the obvious reason that it's pages are highly complex and include many pictures."

Eh? Don't see how pictures would matter... But if Facebook is using memcache for HTML fragments, then I would agree that this is the right kind of cache. I don't know if this is the case or not. Other things like my list of friends or my contact info would be a poor choice for memached. Something like the Facebook feed... That is a lot tougher. There are limits to what you can cache, since the feed changes a lot and you might have a low tolerance for stale data. You might be able to create HTML fragments for the stories and cache those?

" Also, fewer of Facebooks pages are time-critical when compared to eBay. On eBay you basically can't cache a page rendering (memcache) if it has less than a minute
of auction time left"

Item listings are certainly time-critical, i.e you expect the price to be accurate when you are looking at a listing and considering bidding on it. This is true regardless of the time remaining, being less than a minute doesn't matter too much. However, that is just one page, many other pages are not so sensitive, but they are very dynamic.

When it comes to picking between MySQL and memcached, I would first say: are you using an ORM but need caching? If the data is being accessed through ORM, then your cache layer should be a database, not memcached. Again the only exception I could see to this would be a graph, i.e. data that is hard to describe relationally (requires self referential foreign keys, etc.)

Thursday, August 14, 2008

Cache Money

Scalability is a hard question, but a lot of people think that scalability is all about caching. In particular, memcached is the answer for caching. I think we can blame Facebook for this. Everybody knows that Facebook makes heavy use of memcached. Terry says that social graphs are a scalability problem for databases that is solved by memcached, so he is clearly drinking the Kool-Aid. The benefits of caching are obvious, but is memcached really the best/only way?

Earlier this year, eBay won an award from MySQL. This was for application we built that we originall Gem Cache. It is a caching tier that is built on top of MySQL. When the caching tier was designed, memcached was given a lot of consideration, but there were some very good advantages we got out of using MySQL instead.

First off, can MySQL be as fast as memcached? Absolutely. MySQL is aggressive about keeping things in memory, and if everything is in memory, it will be as fast as memcached. You can use MySQL's MEMORY engine, to accomplish this, or you can stick with MyISAM and let MySQL's caching put things in memory for you. Obviously you need to split your database, but we already knew how to do that efficiently. With that in mind, here are the advantages that MySQL offers.

1.) SQL Semantics. You are not limited to just simple "put" and "get." You can do selects and joins, aggregates, etc.
2.) Uniform Data Access. Do you use some kind of ORM? You can leverage this with a MySQL based cache.
3.) Write-through Caching. In a typical memcached setup, updates are still done to the database and this invalidates one or more objects in memcached. With a MySQL based cache, a row in the cache corresponds to a row in the "real" database. So we can write the cache and then asynchronously update the system of record.
4.) Read-through Cache. Similarly, you can always attempt to read from the cache database. If there is a cache miss, you can invisibly read from the real database and add to the cache at the same time.
5.) Replication. MySQL allows for replication of data, so it is easy to add redundancy and fail-over to your cache. Replication can also be useful when you have multiple data centers.
6.) Management. There are lots of great management tools for DBA, operations folks, etc. to use with MySQL.
7.) Cold starts. When your cache is a copy of database rows, it is easy to bootstrap it from your source, since the source and the cache are so similar.
8.) Eviction. Memcached gives you basic expiration, but otherwise you are handling eviction yourself. The caching in MySQL is a more useful LRU policy.

So there, just a few obvious advantages to using MySQL as a cache instead of memcached. Now I know that a lot of folks use MySQL as their "real" database, so it may seem weird to use it as a cache as well. But they are probably (hoepfully) using InnoDB for the "real" DB and that really is a different beast than MySQL with MyISAM or MEMORY tables. And it's not like you have to pay for extra licenses are anything... What are the advantages of memcached over MySQL? The only obvious one to me is if you want to cache things that don't fit in the database, like deep object graphs or HTML fragments, etc.

Wednesday, April 16, 2008

eBay Wins MySQL Award

Yesterday eBay won an award at the O'Reilly MySQL conference. Congratulations to the Chris Kasten's team. Chris also did a presentation today at the MySQL conference. The caching layer his team created allows us to store "session" data on the server. This data can actually be persisted across sessions, depending on the use case, so it is really a little better than your typical HTTPSessions. Plus it actually scales :-)

The MySQL cache compliments HTTP cookies. We also use Flash for local storage as another alternative to the tiny world of HTTP cookies. Flash has a lot of advantages. The most obvious is the 100K default limit as opposed to the 4K limit of HTTP cookies. Flash "cookies" are not sent to the server, making them much more secure. One of the fun things I did recently was help setup guidelines on when an application should use plain 'ol HTTP cookies, Flash local storage, or the MySQL cache. I got to become much more familiar with the MySQL cache, and thus it wasn't suprising to me when I heard about it winning the award from MySQL.