Programming and politics: twitter

Showing posts with label twitter. Show all posts

Monday, July 04, 2011

Google+ and Hitting the Reset Button

Reset your social network

So you might have heard this week that there's this new social networking thing (*yawn*) called Google+. That's right, it's from Google. So it's gonna suck, right? I was skeptical at first, as were many others. After all, nobody roots for The Big Company who clones popular products made by smaller companies, and Google has had a well known poor track record in this space. But after a few days of using Google+, I'm a believer -- in its potential. Here's why.

Google+ is a chance to hit the rest button on social networking. For me, when I first started using Facebook it was mostly to catch up with old college classmates. Two big events happened later. Facebook was opened up to everybody, and Twitter showed up. When Facebook was opened up to everyone, I pretty much accepted friend requests from anyone I knew at all. I still didn't use Facebook that much. Meanwhile I really fell in love with Twitter when it showed up. On there I connected mostly with people in the same or related professions as me. Most of my tweets were around things that had to do with my job (programming, technology.)

Meanwhile on Facebook, I had more and more family members join. Suddenly Facebook became a great place to share family related things, especially pictures. Then I hooked up my Twitter account to Facebook. Now I could just post things to Twitter, and they would show up on Facebook. Then I would occasionally post pictures on Facebook as well. However, most of my tweets were geeky stuff. I did have some co-workers and college classmates who found such things interesting, but more and more most of my friends on Facebook (lots of family high school classmates) found this useless. Eventually I cut off the Twitter-Facebook connection.

My story here is certainly different from a lot of folks, but I imagine there are a lot of similarities too. Google+ seems to offer the possibility to do things over and get it right this time. The key is its grouping feature, Circles. You have to put people in Circles, so you are motivated to organize your friends. This is important. Facebook and Twitter have both had similar features for awhile, and they just aren't used. Twitter's lists aren't really comparable since you still send tweets to everyone. Facebook's groups are more comparable, so why aren't they used?

All your privacies are belong to us

First and foremost, I don't think Facebook really wants anyone to use them. They have a pretty strong history of trying to decrease privacy on their network. Obviously Facebook benefits if everything posted on their network can be searched and viewed by others on Facebook. It seems like one of those features that they added because some users wanted it, but it did not benefit Facebook The Business. Within a couple of days of Google+'s debut, reports came out of a Facebook engineer easily hacking together the same interface to use with Facebook groups. So clearly it would have been pretty easy for Facebook to make groups easy for users to use to organize their friends and incorporate groups into content published on Facebook, but instead Facebook chose not to do this.

This raises the question of why the heck is Google+ doing it? If I had to guess, I doubt that Google really wants to do this either. However, this is one of many places where Google+ feels like something designed around the strengths and weaknesses of its competition, Facebook and Twitter. Privacy was an obvious weakness of Facebook and so Google+ takes advantage of that. It's the kind of thing you do to get market share, whereas Facebook has been doing just the opposite because they are trying to monetize existing users and content.

Resistance is futile

Privacy is not the only place where Google+ feels like a product that has been cleverly designed around its competition. In fact it reminds me a lot of embrace, extend, extinguish era Microsoft. I think they have realized that they don't necessarily have to come up with something that Facebook and Twitter don't do at all, they can just do a lot of the same things and do them a little better. Some other examples of this are viewing pictures and allowing rich markup in status updates. So they make a slightly better product, then they play their own monopoly card by putting the G+ bar on top of all Google pages, including search results and GMail...

Anyways, going back to privacy... The creation of Circles is only half the battle. The other half is picking which of your Circles to publish to. G+ has made this easy to do, and it is a feature that I want to use. However, I don't know if others will do the same. Right now it still seems like most posts that I read are public. This may change as more people start to use G+, but maybe not.

If it doesn't change, then G+ then seems like it will be more of a competitor to Twitter than to Facebook. It already has a lot of similarities, since it uses an asymmetric friendship model like Twitter. I definitely noticed a drop in tweets by those I follow on Twitter since G+ came out. If people don't use the privacy features, then the most it could become is a better Twitter. There have been other better Twitters before, so I don't know if that is good enough. Features like hangouts (group video chat) and huddles (group messaging) seem like they could appeal to Facebook users, but it's hard to say right now. For me, the kind of folks who I use Facebook to communicate with, but would not use Twitter to communicate with, have not even heard of Google+. Yet.

Sunday, October 25, 2009

Social Technology Fail

This is the kind of posting that needs a disclaimer. I'm going to talk a little about recent changes at Facebook and Twitter, but strictly from a technology perspective. It goes without saying that I have no idea what I'm talking about. I am fortunate enough to be acquaintances with several engineers at both companies, and I have a college classmate (and fellow Pageboy) who seems to be a pretty important dude at Facebook, but I have no extra knowledge of these companies' technology than anybody else. So just to repeat: I have no idea what I'm talking about. You should really stop reading.

Since you are still reading, then I will assume that you too enjoy being an armchair architect. Since my day job is as an architect at eBay, I tell myself that exercises such as this make me better at my job. Heh heh. Let's start with Facebook.

For several months now, I've noticed an interesting phenomenon at Facebook. My news feed would often have big gaps in it. I have about 200 friends on Facebook, and I'd say that around 70% of these friends are active, and probably 20-25% are very active Facebook users. So at any time I could look at my feed, and there would be dozens of posts per hour. However, if I scrolled back around 3-4 hours, I would usually find a gap of say 4-6 hours of no posts. The first time I ever noticed this, it was in the morning. So I thought that this gap must have been normal -- people were asleep. Indeed, most of my friends are in the United States. However, I started noticing this more and more often, and not always in the morning. It could be the middle of the day or late at night, and I would still see the same thing: big gaps. So what was going on?

Well here's where the "I don't know what I'm talking about" becomes important. Facebook has been very happy to talk about their architecture, so that has given me speculation ammo. It is well known that Facebook has probably the biggest memcached installation in the world, with many terabytes of RAM dedicated to caching. Facebook has written about how they have even used memcached as a way to synchronize databases. It sure sounds a lot like memcached has evolved into something of a write-through cache. When you post something to Facebook, the web application that you interact with only sends your post to the cache.

Now obviously reads are coming from cache, that's usually the primary use case for memcached. Now I don't know if the web app can read from either memcached and a data store (either a MySQL DB, or maybe Cassandra?) or if Facebook has gone for transparency here too, and augmented memcached to have read-through cache semantics as well. Here's where I am going to speculate wildly. If you sent all your writes to a cache, would you ever try to read from anything other than the cache? I mean, it would be nice to only be aware of the cache -- both from a code complexity perspective and from a performance perspective as well. It sure seems like this is the route that Facebook has taken. The problem is that not all of your data can fit in cache, even when your cache is multiple terabytes in size. Even if your cache was highly normalized data (which would be an interesting setup, to say the least) a huge site like Facebook is not going to squeeze all of their data into RAM. So if your "system of record" is something that cannot fit all of your data... inevitably some data will be effectively "lost." News feed gaps anyone?

Maybe this would just be another useless musing -- an oddity that I noticed that maybe few other people would notice, along with a harebrained explanation. However, just this week Facebook got a lot of attention for their latest "redesign" of their home application. Now we have the News Feed vs. the Live Feed. The News Feed is supposed to be the most relevant posts, i.e. incomplete by design. Now again, if your app can only access cache, and you can't store all of your data in cache, what do you do? Try to put the most "relevant" data in cache, i.e. pick the best data to keep in there. Hence the new News Feed. The fact that a lot of users have complained about this isn't that big of a deal. When you have a very popular application, any changes you make are going to upset a lot of people. However, you have to wonder if this time they are making a change not because they think it improves their product and will benefit users overall, but if instead it is a consequence of technology decisions. Insert cart before horse reference here...

Facebook has a great (and well deserved) reputation in the technology world. I'm probably nuts for calling them out. A much easier target for criticism is Twitter. I was lucky enough to be part of their beta for lists. Now lists are a great idea, in my opinion. Lots of people have written about this. However, the implementation has been lacking to say the least. Here is a very typical attempt to use this feature, as seen through the eyes of Firebug:

It took my five attempts to add a user to a list. Like I said, this has been very typical in my experience. I've probably added 100+ users to lists, so I've got the data points to back up my statement. What the hell is going on? Let's look at one of these errors:

Ah, a 503 Service Unavailable response... So it's a temporary problem. In fact look at the response body:

I love the HTML tab in Firebug... So this is the classic fail whale response. However, I'm only getting this on list requests. Well, at the very least I'm only consistently getting this on list requests. If the main Twitter site was giving users the fail whale at an 80% clip... In this case, I can't say exactly what is going. I could try to make something up (experiments with non-relational database?)
However, this is much more disturbing to me than what's going on at Facebook. I don't get how you can release a feature, even in beta, that is this buggy. Since its release, Twitter has reported a jump in errors. I will speculate and say that this is related to lists. It would not be surprising for a feature having this many errors to spill over and affect other features. If your app server is taking 6-10 seconds to send back (error) responses, then your app server is going to be able to handle a lot less requests overall. So not only is this feature buggy, but maybe it is making the whole site buggier.
Now, I know what we (eBay) would do if this was happening: We'd wire-off the feature, i.e. disable it until we had fixed what was going wrong. Twitter on the other hand...

Huh? You've got a very buggy feature, so you're going to roll it out to more users? This just boggles my mind. I cannot come up with a rationale for something like this. I guess we can assume that Twitter has the problem figured out -- they just haven't been able to release the fix for whatever reason. Even if that was the case, shouldn't you roll out the fix and make sure that it works and nothing else pops up before increasing usage? Like I said, I just can't figure this one out...

Saturday, October 10, 2009

The End of The Aughties

There are 72 days left in the Aughties, y'know the current decade: 2000 - 2009. I was looking back at the decade, and what are its most important events. Here's my little list:

9/11 -- This is obvious. September 11, 2001 is clearly one of the most pivotal days in the history of the United States. In the previous century, there are probably only a couple of comparable events: the bombing of Pearl Harbor, V-E day, the moon landing, the JFK assassination. For several generations of Americans, 9/11 will be the most historical day of their life.

The Election of Barack Obama -- President Obama's election was historical in so many ways. Obviously it was historic that an African American was elected President. It also marked a transition to a new generation -- Obama is 15 years younger than Bush or Clinton (and let's not even mention McCain.) Obama is not only a Democrat, but is not from the more conservative, Southern Democrats of Clinton and Jimmy Carter.

Social Media -- Here's where maybe my perspective is skewed by living in Silicon Valley. Social media is not a single event, in fact it is a progression of events. To me, it really started with blogging and YouTube, and then exploded with MySpace, Facebook, and Twitter. It is a fundamental change in the Internet. Every user is a creator of content, as well as a consumer. It is the great democratizing effect of the Internet, and it is only getting started. Even now we are starting to see how businesses, celebrities, etc. realize that not only can they use social media as a channel to customers and fans, but that it is a two-way channel.

Hurricane Katrina -- What made Hurricane Katrina so pivotal is that opened the eyes of Americans. It made people realize that many of their fellow Americans live in awful conditions. The divide between socioeconomic classes in America were never so obvious as during Katrina. When Kanye West went on TV and said that George Bush didn't care about black people, he wasn't just being a jackass, he was stating a sentiment shared by a lot of people.

The iPhone -- What did I say earlier about having a Silicon Valley perspective? Anyways... The iPhone has completely changed so many things for so many people. In the 90's, The Internet changed people's lives by bringing them information. Now the iPhone lets them carry it around in their pocket. Other phones were certainly moving in that direction, but the iPhone broke through by combining a large display with highly usable touch based interface. This revolution continued with the release of the App Store. Now don't get me wrong. A lot of other phones are following suit -- but that's exactly why the iPhone was so historical.

That's my short list. I know it's obviously biased from me being American and living in Silicon Valley. What did I miss? What doesn't belong?

Thursday, June 04, 2009

Liveblogging of Scala Actors Talk from JavaOne

Today I was attending a talk on Scala Actor by Phillipp Haller and Frank Sommers. Tyler asked me to live blog it. I thought that was a good idea, and that Twitter would be the way to do it. This did not work out, as Twitter shut me down for too many tweets in a given hour. So now I am posting the full live blog, both the tweets that went through and the ones that Twitter rejected. However, pulling out the text of all of the good tweets is not totally straightforward. I figured the easiest was to do this was to use some Scala with the Twitter API via the REPL:


scala> import java.net._  
import java.net._

scala> import scala.xml._
import scala.xml._

scala> val url = new URL("http://twitter.com/statuses/user_timeline.xml?since_id=2035817713&max_id=2036208882&screen_name=michaelg&url: java.net.URL = http://twitter.com/statuses/user_timeline.xml?since_id=2035817713&max_id=2036208882&screen_name=michaelg&count=75

scala> val conn = url.openConnection                                                                                               conn: java.net.URLConnection = sun.net.www.protocol.http.HttpURLConnection:http://twitter.com/statuses/user_timeline.xml?since_id=2035817713&max_id=2036208882&screen_name=michaelg&count=75

scala> val tweets = XML.load(conn.getInputStream)                                                                                  tweets: scala.xml.Elem = 


scala> (tweets\\"text").foreach{ node => println(node.text) }

The result of the last call produces the following:

This allows for many more actors than worker threads #javaone
Work stealing makes this even more efficient. Let worker threads steal work from other worker threads #javaone
So decouple from threads using thread pools from java.util.concurrent. #javaone
A naive impl of thread-per-actor has overhead from thread creation, context-switching and memory use. #javaone
How event-based actors decouples threads and actors. How the execution enviro is setup. How the DSL works #javaone
Now time to go under the hood of Scala Actors #javaone
Only use receive when you must return a value from an Actor #javaone
Use react whenever possible and only use receive when necessary #javaone
When to use receive vs. react? #javaone
Replace while(true) { ... } with loop { ... } #javaone
Go from 5,000 actors/JVM to 1,000,000/JVM using event based Actors #javaone
Replace receive with react, receiveWithin with reactWithin, and while(condition) with loopWhile (condition) #javaone
To scale to a large number of Actors, go for event based Actors #javaone
The sender of a message is still sent across the network #javaone
Everything else works exactly the same as it would for local actors #javaone
Use the select function to get a proxy to the remote Actor #javaone
To make an actor remote is very easy. Call the alive function and the register function #javaone
So far everything has been within a single VM, but Scala Actors scales using remote actors #javaone
To specify timeouts, use receiveWithin. This takes a timeout and sends a TIMEOUT message #javaone
All of this makes it easy to keep track of subscribers and send them messages, like new chat messages #javaone
You can also wait for futures using receive function #javaone
When you try to access the future, it will then block if the reply has not been received #javaone
For async with assurance that the message was received using double-bang !! Returns a future representing the reply #javaone
Synchronous messages are allowed too using !? This blocks until reply received #javaone
Bang is asyc, returns immediately and passes a reference to the sender #javaone
To subscribe a user to the chat room, just send a message using the "bang" operator (method): ! #javaone
There is another API shortcut for creating an actor inline using the "actor" function #javaone
Creating a subscription is just a matter of sending message stating as much. Lets recipient capture the state #javaone
Common pattern of "child actors", benefits from asych communication of actors #javaone
Make each user an Actor as well to maintain state, such as who sent a message to the chat room #javaone
Now going back to the chat application to demonstrate Actor subscriptions #javaone
Patterns are tried in order, and the first match wins. No fall-through. Looks like a factory method call. #javaone
This is all based on Scala's pattern matching, think Java's switch statement but on steroids #javaone
If no message in mailbox, the actor will suspend until there is a message. Syntax for matching messages is super simple #javaone
Use the receive function to process messages within the act method of your Actor. Receive gets a message from the Actor's mailbox #javaone
Defining messages is made easy by using Scala's case classes. These are pure data classes typically. #javaone
Creating an actor is very similar to creating a thread in Java. #javaone
Creating an Actor is easy, just do 3 things. Extend Actor. Implement act method. Start the actor. #javaone
Frank showing a chat application architecture that uses Actors "the quintessential Actor system" #javaone
Actor implemented entirely as a DSL, but part of Scala std. library. Used in big systems such as Twitter #javaone
Actors can be local or remote, no change to programming model. #javaone
Actors are event based in Scala, not tied to Java threads. So much more lightweight. A single JVM can have millions of actors #javaone
Actors use several Scala language features: pattern matching, closures, traits/multiple inheritance, and Java interop. #javaone
Actors in Scala are probably the closest implementation of Erlang model on the JVM #javaone
OTP: library of Erlang modules for building reliable distributed systems #javaone
Erlang has support for Actors built-in. Erlang VM is process-oriented, makes Actor model very scalable. #javaone
Best known example of Actors though is from Erlang, "a pure message passing language" #javaone
Actors are an established pattern. Research dates to 70s at MIT in Smalltalk and Simula #javaone
Mutable messages are also possible. Actors can also be local or remote -- same programing model #javaone
Actors share nothing, private state. Thus no synchronization. Actors send messages, but the message are immutable #javaone
Actors are active objects and has a mailbox. Actors consume messages, perform actions but the code is sequential within an actor #javaone
There is an existing concurrency model that fits with the described solutions: Actors. #javaone
The solution: DSL + sequential programs. Shared nothing. Message passing. Asynchronous communication #javaone
Why learn about Scala actors? Concurrency presents opportunity and problems for developers, especially large teams of developers #javaone
Frank and Phillip are writing a book on Scala actors #javaone
Phillip Haller and Frank Sommers talking about Scala actors now #javaone

So that's the good tweets. Here are the ones that did not go through...

The actor method takes a closure creates an Actor
The react method saves pattern matching as a closure to be executed later
Uses exception for control flow, any code after react will not execute
Actors usually execute on different threads, but you can control what threads
This is useful for threadLocals and for Swing
A demo of SwingActor
Example of using link and trapExit, but not much detail of this shown
Scala Actors provide a simple programming model to write concurrent code on the JVM
Many projects already use actors: Twitter, Lift, Scala OTP (@jboner), partest
Future of Scala Actors in 2.8
Pluggable schedulers -- create actors with daemon-like behavior for example
Integrating exceptions and Erlang-style error handling. Get the best of both Java and Erlang error handling
Support for continuations. Currently requires optional compiler-plugin. Allows for more flexible control structures
Actor migrations (from machine to machine) could also be done with continuations
More static checking of messages, in particular an @immutable annotation
Actor isolution -- no accidentally sharing mutable state
This will allow for better passing mutable messages for performance benefits
No problem passing large methods, as they are passed by reference

Sunday, February 08, 2009

Added Twitter Social Graph APIs

Twitter recently created social graph APIs. You can get the friends or followers of a given user. This information was available in other ways previously, but the new social graph APIs are very convenient they return an array of numeric IDs. So if you want to do things like visualizing the social graph of people or use this to suggest friends, etc. now you've got an easier way to do it. Of course a great way to visualize such data is in Flash. So I added these APIs to TwitterScript. When the TwitterEvent returns, its data property will be the array of numeric IDs.

Thursday, August 28, 2008

Search Twitter from Flash

I have updated the Twitter ActionScript API. I added support for search. You are probably aware that search is provided by Summize, who was acquired by Twitter. It is pretty obvious that the APIs have not yet been merged!

Twitter's API is all based on Rails ActiveResource ... which is awesome. It turns any resource (often a database table) into a RESTful service. REST is often associated with XML, but Rails (and thus Twitter) supports it as JSON (Twitter supports ATOM and RSS as well) too. For ActionScript, XML is great. Or I should say POX is great and that is what Rails serves up.

The Twitter Search API is different. It supports two formats: ATOM and JSON. No POX. I went with the ATOM format. For JSON, I would have used Adobe's corelib. It works well, but I didn't want to add the weight. Plus, JSON tends to parse much slower in AS3 than XML. That is because AS3 implements E4X. To get E4X to work with ATOM, you have to be mindful of namespaces. For example, here is the code I used to interate over the entries: for each (entryXml in xml.atom::entry). Here the xml variable is the parsed result from search.twitter.com and atom is a Namespace object. Not as pretty as just xml.entry, but oh well.

Tuesday, July 22, 2008

Get Lifted

Yeah, I'm a John Legend fan, but that's not the reference here. The reference is to Lift, the web application for Scala. Today IBM published an article I wrote about Lift. This was an article that I pitched to IBM and I was very excited about writing. Lift itself takes a different approach to web development than the typical MVC approaches. It is also in Scala, and takes great advantage of that languages features, especially its native support for XML.

One of the most surprising features of Lift was the ORM that it includes. Lift's creator, David Pollak, commented that he would use JPA for complex schemas and not Lift's ORM. I must admit, it was the part of Lift that took me the longest to get my head around. I consider myself quite the veteran of ORMs, but Lift's is definitely unique. I think it needs some tooling around it, as it felt like some boilerplate-ish code was present (like creating a model class and singleton factory for the model class.) However, I really liked its use of generics.

I didn't have time in the article to get into Comet with Lift and Scala's Actors. That is an awesome feature. Hmm, perhaps that should be another IBM article..

Finally, of Lift and Scala related news... check out Graceless Failure by some of the developer at Twitter. It seems to imply that Scala and maybe Lift or at least "in the mix" at Twitter. I wonder if it is replacing Ruby and/or Rails in some places. Certainly Actors would seem like an obvious way to handle new updates on Twitter.

Friday, May 23, 2008

Armchair Architects

Funny post by @al3x about Twitter architecture. Oh wait it wasn't supposed to be funny, oops. The fact is that he should totally expect more people to diss Twitter and pretend that they could easily solve all of the problems. I am not just being cynical about people, they actually have some good reasons to do so:

Twitter crashes a lot. If your site did not crash so much, then people would not think you are an idiot and that they could easily do a better job. The people may all be wrong, but that does not matter. Why do you think Microsoft has come to have such a bad reputation? People do not care about what MSFT did to Netscape, Sun, or Apple. They care about BSODs. People hate Vista because Microsoft did not make it as backwards compatible with 3rd party drivers that did lots of bad, hacky things. But now Vista crashes, so people complain about MSFT.
Twitter seems simple. You put a 140 character limit on updates and what do you expect? Part of Twitter's appeal is its simplicity, but that same simplicity creates expectations and makes people think they could do it themselves better. Maintenance is expected for things that seem complex, like cars or Photoshop, but not for (seemingly) simple things like iPods or Twitter. If you think hard about it, Twitter is much more complex than it seems, but who wants to think hard?
Ruby developers are obnoxious. Oh this is my favorite. Ruby developers are a small but very vocal group. They love rubbing it in your face that Ruby is so much more expressive or object-oriented or whatever than anything else on the planet. The Rails sub-cult is even worse about this. So when the most high-profile Rails site starts failing constantly, you must expect a lot of smug developers to wag their fingers. It is kind of a shame that Twitter is paying for DHH's bad karma ... but then again @blaine did make that infamous claim about how easy it was to scale Rails. Of course he's gone now, but there is still enough bad karma to go around. How many Ruby developers would admit how bad their software is? Think about that.

Saturday, May 03, 2008

Twitter Me This

No Twitter running off the Rails discussion tonight. One reason I write about Twitter is because I really value the service. It was particularly useful to me today.

I took my oldest son, Michael, Jr. to Maker Faire today. We left right after lunch. I set Twitter to deliver messages via IM, which for me means Google Talk. I have a Google Talk client on my Blackberry, so all updates went to my phone via IM. I did a "track #makerfaire". Just as I was about to hit the road I see tweet saying how bad traffic was on the 101 near San Mateo, where Maker Faire takes place. I also see a tweet saying the best way to avoid the street traffic was to take the Hillsdale Blvd. exit to Saratoga Drive, where there is free parking. These were not tweets from people I follow, but from people going to Maker Faire, and they were right on. So I took 280 to 92 instead of 101, and used the Hillsdale Blvd tip to find free parking. I got to see a parking lot on the freeway near the San Mateo fairgrounds, as well as on Delaware Avenue (where most people got off the freeway) getting onto Saratoga Drive. I did not have to deal with any of that traffic. Thank you Twitter!

Not long after I got home from Maker Faire, I checked Twitter and saw the first mention of Microsoft withdrawing their bid for Yahoo. I had turned off my Blackberry setup, but immediately change my settings back to IM and turned on iChat on my MacBook. I did this so I could track Yahoo and Microsoft on Twitter. All I can say is ... wow. It was amazing to watch the collective conscious of ... well at least Silicon Valley ... react to such surprising news. Now I'm not going to exaggerate, most of the tweets were redundant and few had any particular insight. That is not the point. Crowd sourcing may be great for traffic info, but not business and technology analysis (just ask a communist survivor!) But it is fun to see how some people were relieved, while others were disappointed because they knew that YHOO stock was doomed to plummet on Monday.

Friday, May 02, 2008

As the Bird Turns

Another week, another Twitter outage ... and a new round of technology questions and rumors. TechCrunch now thinks that Twitter is abandoning Rails. This time out, Arrington attempts to be a little more fair-and-balanced than when he wrote about Blaine leaving. He points other sites that claim to have scaled Rails. This was of particular interest to me, so let's take a look.

Scribd -- Slide 7 claims three databases! Uh oh, is DHH right and I'm going to have to eat crow? Well maybe, but not because of Scribd. They only use master-slave relationship, but cleverly offload expensive queries to the slaves. Still only one place to write data. When (if) their data gets too big for a single database, they are going to be the ones singing "bring that beat back!"

Friends for Sale -- I could write a lot just about these guys and how ... umm ... interesting their setup is. I'll just quote them: "The most important thing we learned is that your scalability problems is pretty much always, always, always the database" but "on the database side we're still with a monolithic master and we're trying to push off sharding for as long as we can." They still have no problem claiming that "The whole 'but does Rails scale?' discussion sounds like a bunch of masturbation - the point is moot." You can't make up stuff like this!

Wednesday, April 23, 2008

Why Johnny Can't Scale

Today TechCrunch "reported" that Blaine Cook has left Twitter. I had the pleasure of interviewing Blaine and his fellow Twitter developer Alex Payne last year. I was working on a book about Ruby on Rails, so they made the perfect people to talk to. After all, they had written a Ruby on Rails application that was being truly pushing the scalability limits of Rails. I was most interested in those limits and how they were attacking them. Let's get back to the present though.

Michael Arrington pins a lot of Twitter's notorious instability squarely on Blaine. He has a point. He points out that Blaine did a presentation at a Rails conference on how Twitter had scaled Rails. If you go out and say "here's how to scale Rails", your app is part of your proof. Facebook can go out and tout the scalability of PHP, MySQL, and memcache. You can dispute their rationale, but their results are hard to argue against. Blaine could have a great rationale behind how they scaled Rails, but the instability of Twitter discredits any argument.

Some of TechCrunch's reader object to this. They point out that Twitter only had three developers. Others say only ignorant non-programmers would blame Rails or Blaine. Maybe they are right. Here goes my explanation.

Take a look at that presentation that Blaine did. In particular, look at slide 3. I quote "180 Rails instances (Mongrel). Growing fast." That is essentially 180 web servers. That is 180 instances of the Twitter application serving requests made by Twitter users. Keep in mind that this was a year ago, so you can only imagine what the numbers are like now. Now take a look at the next line in that slide "1 Database Server". No mention of that growing. I won't claim to know the intricacies of Twitter's operations, but this is an obvious bottleneck.

Since then Blaine & co. did a lot to alleviate the pressure on their bottleneck. They created an innovative messaging system called Starling. They made heavier user of memcache. To my knowledge, they still have that 1 Database Server.

If you are familiar with Rails, then you know that this is a flaw in Rails (a flaw in ActiveRecor to be more precise.) RoR associates a class to a database connection. You can write code that alters this behavior, but it is very hard to do this efficiently. Now let's be fair. This is true of a lot of frameworks, maybe all of them. Java practitioners like me know that our favorite technology with similar functionality, Hibernate, has the same flaw. Google has kindly produced an extension, Shards, to address this. Certainly the general JavaEE specification does not address this.

So it's not just a Rails problem, and we shouldn't blame Rails, right? Not so fast. Rails is the poster child for rapid, high productivity development frameworks. Remember how TC readers pointed out that Twitter only had three develoeprs? Rails is a big part of that story. In general, Rails is one of many technologies that seek to simplify web development as much as possible. This allows three developers to build such a popular site, but that's also the problem.

It's easier than ever to build a website. Social media makes it easier than ever to gain eyeballs. But it is just as hard to scale your site to handle massive traffic. Actually, maybe it is harder. When you rely heavily on frameworks and out-of-the-box goodness, it is that much harder to rip it out or reject it when you outgrow it. Rails magnifies this problem, because it is not just a technology, it is a philosophy. Rails developers pride themselves on the elegance and terseness of their code. If you are already writing ugly code, it's a lot easier to throw it away and write code that is an order of magnitude uglier but solves your scale problems. When you write beautiful code and don't know of a beautiful way to solve your scale problem, what do you do?

Finally, it is easy to trivialize Twitter's problems. "Oh they just need to use PHP" or whatever. Think about their application a little more before you say that. Let's think about a really naive approach to their application. You have a User who creates Updates. One User has many Updates. For example, I am a Twitter User and today I have made three Updates. I follow 65 other Users. The naive approach (also the one a DBA would tell you to use) is to have a join table to associate who you follow. So what has to happen when I load my home page? First, we need to figure out the 65 people I follow. Next we need to get their Updates. How many of their Updates? Well there are 20 Updates shown, but they are the 20 newest across all of my 65 friends. So do you grab all of the updates from all of my friends and then sort it? That is certainly the most naive approach. In that naive approach, we need one query to get my friends and then N queries to get all of their updates. We can put a sort on those N queries and then do a merge sort, quitting once we get the first 20 potentially. Still each of those N queries could be returning a lot of data. We all follow @Scobleizer, right?

So even if we only needed "1 Database Server", things are non-trivial. What about splitting your database? Obviously you want to split the Updates table. Let's think about your home page. Ideally all of the Updates on that page would be in the same database, so all of your friends need to be the same database server. But what about the other people following your friends? You see where this is going. The Kevin Bacon six degrees of separation problem is going to kick your ass.

The point is that even though Twitter might seem simple to the outsider, it is not. It is complex. Even if they didn't use Rails, it is non-trivial. Rails definitely does not help, at least in some regards, and that's the real story. It may be easier than ever to build something and, perhaps because of social networking, easier to get an audience. But all of the syntactic sugar in the world doesn't make it any easier to scale an application.

Tuesday, April 08, 2008

Twitter API in ActionScript

I noticed that the Twitter ActionScript API was a bit out of date to say the least. I had to update it for a small project I was playing around with. I sent a note to al3x about this, and suggested open sourcing the API. He agreed and so here it is.
It needs a lot of work. I basically removed the old authentication scheme, since it relied on setting the Authorization HTTP header and that is no longer allowed starting in Flash Player 9.0.115. This also let me remove a dependency on a third party base 64 encoder. I guess a base64 encoded string that includes your username and password is supposed to be secure :-) I also fixed the loadFriends method, as this now returns users not status messages. I will get the API up to date, but if anyone wants to help out then just send me a note.

Thursday, November 08, 2007

Flock 1.0

In case you missed the news, Flock finally went 1.0 late last month. I've been playing around with Flock for over a year. They're integration with social services has gotten better and better. I did a clean install of it and setup my Flickr, YouTube, and Blogger accounts very easily. I've given up on using it for RSS, as having a server-based solution (Google Reader) is too valuable to me. They've also added Facebook and Twitter integration, two favorite services of mine. Both are well done. And of course it's based off Firefox 2.0 now, which is great. Most Firefox plugins work great with it. I'm using its blog writer right now, as I'm planning on using it as my primary for a week or so and then figure out if I should go back to Firefox or not.

Blogged with Flock

Sunday, April 15, 2007

Twitter Slowness Explained

Now I know why Twitter is so slow. It was built using Ruby on Rails. That explains so much.

Thursday, March 29, 2007

Open Source Twitter

Dave Winer wonders about creating an open source Twitter. I've done some musings previously on the mechanics of Twitter. In short it would be interesting to create a Twitter. I'm not sure if Dave is thinking mostly of Twitter clients into the existing network, or sub-nets that may or may not converse with the existing network. Twitter is done again this morning. I'm not sure if that indicates the challenges in creating such a scalable network or in the current implementation of it.

Monday, March 19, 2007

More Twittering

If there's one way to describe Twitter so far it's slow. I guess that's a good sign for them. I'm going to assume they must have a lot of traffic making them so slow. Let's hope it's not light traffic making them slow.

If you think about it though, you could probably consider Twitter to be a "high transaction" environment. A lot of people compare Twitter to IM clients, but it's really a lot different from a technical perspective. All the "tweets" are sent to the server and persisted (presumably in a database.) Hence the "transactions".

The other fun side must be sending out the updates to followers. There's two obvious ways to architect such a thing: pull or push. For pull, each user logged in would have some list of people they are following. A proxy for that user would have to constantly poll the server to see if there are any new updates. In push, you keep track of the people following each user. So if a user sends in an update, it needs to be pushed to the user's followers.

There are problems with either track, and typically an asynchronous pub-n-sub system would be used for something like this. Maybe Twitter does that. The trickiest part would be determining the subscribers. These would be people logged in on the website, obviously. But it would also be people with IM established and logged on to their IM (I guess they could try to send the IMs regardless if people are logged in or not.) Ditto for SMS. I guess if any of those criteria are met, then you could create a virtual subscriber for receiving updates and then relaying them appropriately. Of course you'll need a scheduler for creating these virtual clients, since one can turn off their SMS updates during certain hours of the day, etc.

Actually I think I would even turn the message persisting into a subscriber. That fits well with the AJAXy interface for it. You send a message and it goes into a queue. A subscriber representing your buddy picks it up and send it to your buddy's phone. Another subscriber representing a persistent store picks it up and saves it to tweet database. Seems pretty slick.

Yep, seems like it could be a fun system to put together. You've got to love a consumer site with lots of database transactions and asynchronous message queuing. Of course if I assume they did put the system together with lots of asynchronous parts, then it really better be bandwidth that's making them so unbearably slow. All the better reason to use IM or SMS for send/receive updates I suppose...

Saturday, March 17, 2007

Twitter

I've been very interested in Dave Winer's recent musings on Twitter. I had originally thought that Twitter seemed like something only MySpacers could get into, but Dave is no MySpacer. So I decided to sign up, and was pleased that michaelg was still available. I always check for mikeg (my email in college was mikeg@caltech.edu, how cool is that?) and then for michaelg. Anyways, I'm giving it a try. I'm really impressed with their SMS and IM integration. It's cool that I can just IM my twitter buddy on GTalk to post updates (why is there no Yahoo IM Twitter buddy???) I think I can SMS updates too, as well as receive updates. It's also cool that there's an RSS feed for my updates. I'm sure that's what got Mr. RSS to start playing with it. I also put a Twitter widget on my blog.