Wednesday, April 23, 2008

Why Johnny Can't Scale

Today TechCrunch "reported" that Blaine Cook has left Twitter. I had the pleasure of interviewing Blaine and his fellow Twitter developer Alex Payne last year. I was working on a book about Ruby on Rails, so they made the perfect people to talk to. After all, they had written a Ruby on Rails application that was being truly pushing the scalability limits of Rails. I was most interested in those limits and how they were attacking them. Let's get back to the present though.

Michael Arrington pins a lot of Twitter's notorious instability squarely on Blaine. He has a point. He points out that Blaine did a presentation at a Rails conference on how Twitter had scaled Rails. If you go out and say "here's how to scale Rails", your app is part of your proof. Facebook can go out and tout the scalability of PHP, MySQL, and memcache. You can dispute their rationale, but their results are hard to argue against. Blaine could have a great rationale behind how they scaled Rails, but the instability of Twitter discredits any argument.

Some of TechCrunch's reader object to this. They point out that Twitter only had three developers. Others say only ignorant non-programmers would blame Rails or Blaine. Maybe they are right. Here goes my explanation.

Take a look at that presentation that Blaine did. In particular, look at slide 3. I quote "180 Rails instances (Mongrel). Growing fast." That is essentially 180 web servers. That is 180 instances of the Twitter application serving requests made by Twitter users. Keep in mind that this was a year ago, so you can only imagine what the numbers are like now. Now take a look at the next line in that slide "1 Database Server". No mention of that growing. I won't claim to know the intricacies of Twitter's operations, but this is an obvious bottleneck.

Since then Blaine & co. did a lot to alleviate the pressure on their bottleneck. They created an innovative messaging system called Starling. They made heavier user of memcache. To my knowledge, they still have that 1 Database Server.

If you are familiar with Rails, then you know that this is a flaw in Rails (a flaw in ActiveRecor to be more precise.) RoR associates a class to a database connection. You can write code that alters this behavior, but it is very hard to do this efficiently. Now let's be fair. This is true of a lot of frameworks, maybe all of them. Java practitioners like me know that our favorite technology with similar functionality, Hibernate, has the same flaw. Google has kindly produced an extension, Shards, to address this. Certainly the general JavaEE specification does not address this.

So it's not just a Rails problem, and we shouldn't blame Rails, right? Not so fast. Rails is the poster child for rapid, high productivity development frameworks. Remember how TC readers pointed out that Twitter only had three develoeprs? Rails is a big part of that story. In general, Rails is one of many technologies that seek to simplify web development as much as possible. This allows three developers to build such a popular site, but that's also the problem.

It's easier than ever to build a website. Social media makes it easier than ever to gain eyeballs. But it is just as hard to scale your site to handle massive traffic. Actually, maybe it is harder. When you rely heavily on frameworks and out-of-the-box goodness, it is that much harder to rip it out or reject it when you outgrow it. Rails magnifies this problem, because it is not just a technology, it is a philosophy. Rails developers pride themselves on the elegance and terseness of their code. If you are already writing ugly code, it's a lot easier to throw it away and write code that is an order of magnitude uglier but solves your scale problems. When you write beautiful code and don't know of a beautiful way to solve your scale problem, what do you do?

Finally, it is easy to trivialize Twitter's problems. "Oh they just need to use PHP" or whatever. Think about their application a little more before you say that. Let's think about a really naive approach to their application. You have a User who creates Updates. One User has many Updates. For example, I am a Twitter User and today I have made three Updates. I follow 65 other Users. The naive approach (also the one a DBA would tell you to use) is to have a join table to associate who you follow. So what has to happen when I load my home page? First, we need to figure out the 65 people I follow. Next we need to get their Updates. How many of their Updates? Well there are 20 Updates shown, but they are the 20 newest across all of my 65 friends. So do you grab all of the updates from all of my friends and then sort it? That is certainly the most naive approach. In that naive approach, we need one query to get my friends and then N queries to get all of their updates. We can put a sort on those N queries and then do a merge sort, quitting once we get the first 20 potentially. Still each of those N queries could be returning a lot of data. We all follow @Scobleizer, right?

So even if we only needed "1 Database Server", things are non-trivial. What about splitting your database? Obviously you want to split the Updates table. Let's think about your home page. Ideally all of the Updates on that page would be in the same database, so all of your friends need to be the same database server. But what about the other people following your friends? You see where this is going. The Kevin Bacon six degrees of separation problem is going to kick your ass.

The point is that even though Twitter might seem simple to the outsider, it is not. It is complex. Even if they didn't use Rails, it is non-trivial. Rails definitely does not help, at least in some regards, and that's the real story. It may be easier than ever to build something and, perhaps because of social networking, easier to get an audience. But all of the syntactic sugar in the world doesn't make it any easier to scale an application.


Unknown said...

I don't think I understand your argument. Rails is flawed because it doesn't ship with default features to do sharding, but neither does any other frameworks, and btw you can't easily shard the Twitter model. That's doesn't seem to compute into a logical argument.

BTW, you can certainly scale the database a fair bit before you need to consider sharding. You can do master-master and you can do master-slave setups. If Twitter is still on a single database, it sounds like they haven't gone down that path yet.

When you do go for those setups, there is often database technology to solve the problem rather than framework technology. For MySQL, I've heard good things about mysql-proxy.

Finally, I call bullshit on "when you like beautiful code you can't write ugly code when you have to". At 37signals, we even dropped down to the ugliest of all code (C) when we felt that was a good way around a scaling issue.

You can still enjoy beautiful code for the 95% of the application that's unlikely to be the bottleneck, then get your work trousers on and do the last 5% in whatever way you need to make it fast enough. (For the majority of sites out there, you can even go 100% beauty as you're unlikely to be the next Twitter).

Unknown said...

Perhaps I rambled a bit too much in that post. I lost part of it because of a (scheduled) Blogger outage.

My point is that people think it is easier than ever to build a site. That may be true for toy sites, but all of the brilliant work of many technologists has not made it much easier to build a web scale site. I don't think many people realize this, and that includes both programmers and "technologists" like Michael Arrington.

I find it funny that you run away from the problem of federating your database. Master-master and master-slaves are just band-aids. Do you know how many databases we have at eBay? Or how many they have at Yahoo, Amazon, or Facebook? Now there are a lot more Basecamps in this world than there are Facebooks, granted. But if you are an entrepreneur, aren't you hoping to be a Facebook rather than a Basecamp? I'm pretty sure I know how the guys at Twitter would answer that question.

So sure, Rails is great for sites where 1 Database Server is all you will ever need. The problem is when you need more. It's not that hard to rip up the right parts of Java, PHP, or C# to get them to scale, but empirical evidence seems to indicate that it is not so easy with ActiveRecord.

Unknown said...

What empirical evidence? Have you talked to people who've tried to do sharding with Active Record and given up because it was too hard?

I know of a good number of startups that run one-database-per-customer setups (which is similar to sharing, albeit at a different tier).

It's entirely possible to run sharding setups with Rails. And it's definitely not something that I run away from. We're going to do sharding the day we have to with Basecamp.

For now, we don't need to, so it would amount to premature optimization to do so, but that doesn't mean there's anything inherent in Rails that makes it impossible. That's preposterous.

I fail to see any substance to this rambling (other than it's easier than ever to build big sites, but scaling big sites is always hard regardless of the technology -- I certainly agree to that).

Unknown said...

I am sure you are much more of a Rails expert than, at least I would hope so. Thus perhaps I am entirely mistaken. Please point me to web scale site that is running Rails and has federated their database. I don't know of any, but perhaps such a site exists and I am just ignorant.

Oh, and you asked "Have you talked to people who've tried to do sharding with Active Record and given up because it was too hard?" Actually my statements were based off my conversations with the Twitter engineering team last fall. Now I hate to quote other people, I only mention it because you asked. They mentioned several Gems that had been created to address this, but that they had all fallen short. They also mentioned trying to do it themselves, but decided that it would be easier to build some messaging and caching infrastructure instead.

Unknown said...

From a quick question on IRC, I got that is running a sharded setup with Rails. I talked to a few other folks doing similar setups and one guy readying up a reusable plugin for this (who mentioned that the implementation for his particular approach was pretty trivial).

But that doesn't even matter. Most people don't shard because it's a pain in the ass (as a technique, not as an implementation) and increases the complexity of your setup considerably. Thus, you usually only take this road as a last resort when you can't go further with your replicated database setup.

Therefore, the fact that not more people are doing sharding is probably more easily explained by the fact that most people don't need sharding. That says absolutely zero about the capabilities of the tools these people use to do sharding.

So it seems we're down to the fact that your "empirical evidence" was one conversation with one team some time in the past who thought another technique looked easier to them. Now there's an iron-clad argument you can take to the bank :).

Unknown said...

Oh wow, thanks. Hmm let's see though... I'm not sure if I would qualify yfly as web scale. Still that's great that they have a federated DB and Rails working together. I'd love to read about it, heck I'd love to write about it.

I guess one possible explanation for why there are no large scale sites that use Rails and have a federated DB is that there are few large scale sites. Another possible explanation is that those sites choose not to use Rails for whatever reasons. Web scale is hard enough that it is not going to be done by a code generation script or by some method_missing cleverness. But it's not that hard. After all it's been done by numerous sites in all kinds of languages/frameworks: PHP, Java, Perl, ASP.NET, C++, Coldfusion, etc. It's a long list, just not a list that includes Rails. Twitter has the best (currently only?) shot of adding Rails to that list. I hope they make it, but I don't think I would bet on it.