Tuesday, May 27, 2008

Bulk Upload to Amazon SimpleDB

This weekend I was helping a friend with loading some data to Amazon's SimpleDB. The problem was fairly simple. He had a flat file with 170K lines of data. Each line represented a video from YouTube along with some metadata about the video. He wanted to turn that file into a "table" on SimpleDB, where each line (video) from the file would become a "row" in the table.

I decided to use Java for the task. I found a useful Java library for using SimpleDB. Some users of the library didn't like it, as it uses JAXB to turn Amazon's XML based API into a Java based API directly. That didn't bother me so I used it.

I wrote a quick program to do the upload. I knew it would take a while to run, but didn't think too much about it. I had some other things to do, so I set it running. Some three hours later, it was still going. I felt pretty silly. I should have done some math on how long this was going to take. So I scrapped it and adjusted my program.

Amazon has no bulk API, and this is the source of the problem. So you literally have to add one item at a time to SimpleDB. The best I could do was to parallelize the upload, i.e. load multiple items simultaneously, one per thread. Java's concurrency APIs made this very easy. Here is the code that I wrote.

import java.io.BufferedReader;

import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;

import java.util.concurrent.TimeUnit;

import com.amazonaws.sdb.AmazonSimpleDB;

import com.amazonaws.sdb.AmazonSimpleDBClient;
import com.amazonaws.sdb.AmazonSimpleDBException;

import com.amazonaws.sdb.model.CreateDomain;
import com.amazonaws.sdb.model.CreateDomainResponse;

import com.amazonaws.sdb.model.PutAttributes;
import com.amazonaws.sdb.model.ReplaceableAttribute;

import com.amazonaws.sdb.util.AmazonSimpleDBUtil;


public class Parser {

private static final String DATA_FILE="your file here";
private static final String ACCESS_KEY_ID = "your key here";
private static final String SECRET_ACCESS_KEY = "your key here";
private static final String DOMAIN = "videos";
private static final int THREAD_COUNT = 40;

public static void main(String[] args) throws Exception{

List<Video> videos = loadVideos();
AmazonSimpleDB service =
new AmazonSimpleDBClient(ACCESS_KEY_ID, SECRET_ACCESS_KEY);
setupDomain(service);
addVideos(videos,service);
}


private static List<Video> loadVideos() throws IOException {

InputStream stream =
Thread.currentThread().getContextClassLoader().getResourceAsStream(DATA_FILE);
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
List<Video> videos = new ArrayList<Video>();
String line = reader.readLine();
while (line != null){

Video video = Video.parseVideo(line);
videos.add(video);
line = reader.readLine();
}

return videos;
}

// This creates a table in SimpleDB
private static void setupDomain(AmazonSimpleDB service) {

CreateDomain request = new CreateDomain();
request.setDomainName(DOMAIN);
try {

CreateDomainResponse response = service.createDomain(request);
System.out.println(response);
} catch (AmazonSimpleDBException e) {

e.printStackTrace();
}
}

// adds all videos to SimpleDb
private static void addVideos(List<Video> videos, final AmazonSimpleDB service) throws Exception{

// create a thread pool
ThreadPoolExecutor pool =
new ThreadPoolExecutor(THREAD_COUNT, THREAD_COUNT, 10,
TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(videos.size()));
// Create a task for each video, and give the collection to the thread pool

for (final Video v : videos){
Runnable r= new Runnable(){

public void run() {
addVideo(v, service);
}

};
pool.execute(r);
}
}

// This adds a single item to SimpleDB

private static void addVideo(Video v, AmazonSimpleDB service){

PutAttributes request = new PutAttributes();
request.setDomainName(DOMAIN);
request.setItemName(v.getVideoId());
List<ReplaceableAttribute> attrs = videoToAttrs(v);
request.setAttribute(attrs);
try {

service.putAttributes(request);
} catch (AmazonSimpleDBException e) {

e.printStackTrace();
}
}

// Turns a video into a list of name-value pairs

private static List<ReplaceableAttribute> videoToAttrs(Video v){
ReplaceableAttribute author = new ReplaceableAttribute();
author.setName("author");
author.setValue(v.getAuthor());
ReplaceableAttribute date = new ReplaceableAttribute();
date.setName("date");
date.setValue(Long.toString(v.getDate().getTime()));
// for votes we pad so we can sort

ReplaceableAttribute votes = new ReplaceableAttribute();
votes.setName("votes");
votes.setValue(AmazonSimpleDBUtil.encodeZeroPadding(v.getVotes(), 4));
return Arrays.asList(author, date, votes);
}



}



And for completeness, here is the Video class:

import java.util.Date;


public class Video {

private final String videoId;
private final int votes;
private final Date date;
private final String author;
private Video(String videoId, int votes, long date, String author) {

super();
this.videoId = videoId;
this.votes = votes;
this.date = new Date(date);
this.author = author;
}

public String getVideoId() {
return videoId;
}

public int getVotes() {
return votes;
}

public Date getDate() {
return date;
}

public String getAuthor() {
return author;
}


public static Video parseVideo(String data){
String[] fields = data.split(" ");
return new Video(fields[1], Integer.parseInt(fields[0]), 1000*Long.parseLong(fields[2]), fields[3]);
}

}


Some interesting things... I played around with the number of threads to use. Everything seemed to max out at around 3-4 threads, regardless of whether I ran it on my two core laptop or four core workstation. Something seemed amiss. I opened up the Amazon Java client code. I was pleased to see it used a multi-threaded version of the Apache HttpClient, but it was hard-coding the maximum number of connections per host to ... 3. I switched to compiling against source so I could set the maximum number of connections to be the same as the number of threads I was using.

Now I was able to achieve much better throughput. I kept number of threads and max number of http connections the same. For my two-core laptop, I got optimal throughput for 16 threads and connections. For my four-core workstation, I got optimal throughput for 40 threads and connections. I think I will re-factor the Amazon Java API and offer it to the author as a patch. There is no reason to hard code the number of connections to three, just make it configurable. The underlying HttpClient code is highly optimized to allow for this.

Friday, May 23, 2008

PECS in Action

One funny bit at JavaOne was Josh Bloch introducing the mnemonic PECS. Well actually what was most funny about it was the picture of Arnold Schwarzenegger that accompanied it... Anyways, PECS stands for producer-extends, consumer-super. Instead of repeating what Josh said or plagiarizing Effective Java, I will give an example of how PECS helped me out yesterday.

One there was an API that existed back in olden times, before Java 1.5 It looked like this:

void runInboundCycles(final Module[] modules)

There was also a runOubound, but you get the picture. The class Module is an interface that has many implementations. This API got tweaked courtesy of Java 1.5:

void runInboundCycles(final List<Module> modules);

Before the change you could do this:

runInboundCycles(new Module[] { new MyModule() } );

A logical uprev would be:

runInboundCycles(Arrays.asList(new MyModule()));

Turns out that won't compile! The Arrays.asList call will return a List and the Java compiler says that is not a List. So instead you have to do something annoying like:

List<Module> modules = new ArrayList<Module>(1);
modules.add(new MyModule());
runInboundCycles(modules);

This is particularly annoying if modules is actually a member variable, as now you cannot declare it to be final. Enter PECS.

My API is using the Modules, thus my parameter is a producer to the API. Remember producer-extends, so refactor the API like this:

void runInboundCycles(List<? extends Module> modules);

Now you can pass a List<MyModule> and the compiler won't complain.

There are a couple of things about this that bother me. When my crazy brain looks at the API, it thinks "the API consumes Modules." Maybe that's just me. The other thing that bothers me is writing ? extends Module because Module is an inteface. Now granted it would suck to have to write ? implements Module just because Module is an interface and write ? extends Module just because Module was a class, so I am not advocating the altnernative. It just feels weird to write extends in front of an interface type. Maybe I have been programming in Java too long.

Armchair Architects

Funny post by @al3x about Twitter architecture. Oh wait it wasn't supposed to be funny, oops. The fact is that he should totally expect more people to diss Twitter and pretend that they could easily solve all of the problems. I am not just being cynical about people, they actually have some good reasons to do so:

  • Twitter crashes a lot. If your site did not crash so much, then people would not think you are an idiot and that they could easily do a better job. The people may all be wrong, but that does not matter. Why do you think Microsoft has come to have such a bad reputation? People do not care about what MSFT did to Netscape, Sun, or Apple. They care about BSODs. People hate Vista because Microsoft did not make it as backwards compatible with 3rd party drivers that did lots of bad, hacky things. But now Vista crashes, so people complain about MSFT.
  • Twitter seems simple. You put a 140 character limit on updates and what do you expect? Part of Twitter's appeal is its simplicity, but that same simplicity creates expectations and makes people think they could do it themselves better. Maintenance is expected for things that seem complex, like cars or Photoshop, but not for (seemingly) simple things like iPods or Twitter. If you think hard about it, Twitter is much more complex than it seems, but who wants to think hard?
  • Ruby developers are obnoxious. Oh this is my favorite. Ruby developers are a small but very vocal group. They love rubbing it in your face that Ruby is so much more expressive or object-oriented or whatever than anything else on the planet. The Rails sub-cult is even worse about this. So when the most high-profile Rails site starts failing constantly, you must expect a lot of smug developers to wag their fingers. It is kind of a shame that Twitter is paying for DHH's bad karma ... but then again @blaine did make that infamous claim about how easy it was to scale Rails. Of course he's gone now, but there is still enough bad karma to go around. How many Ruby developers would admit how bad their software is? Think about that.

Wednesday, May 21, 2008

SVJUG: JPA 2.0

Last night I went to the monthly SVJUG meeting. Patrick Linskey from BEA, err Oracle, spoke about JPA 2.0. First off, Patrick is an excellent speaker. I belive he was acuired by BEA via the SolarMetric acquisition, and now he's been acquired by Oracle. If nothing else, he is a much better speaker than anyone I've ever heard from Oracle. There were two very interesting things I saw in JPA 2.0.

The most interesting was query introspection. To me this move really makes it possible for JPA to get "pushed down the stack" if you will. I think it will allow for more abstracted frameworks to use JPA and hide it from the programmer. Working recently with Grails made me realize why this is important. Grails adds many JPA EntityManager methods to the domain classes. In many cases, you don't even have to call these methods. A typical Grails crud action does something like def myDomainObject = MyDomainObject.get(params.id) and then myDomainObject.properties = params and that's it. There is no explicit call to save() or persist(), etc. One could argue that this is a very good thing, as code like entityManager.persist(myDomainObject) is clearly boilerplate.

Now it would be pretty hard to get the similar functionality in Java, as you cannot add methods to objects at compile-time. You could make your domain objects extend an abstract class, but we are all too in love with POJOs to allow for that. However, a container of some sort could do these things for you. Introspection APIs into the JPA are a key to such a thing working. I don't know if the ones being added in JPA 2.0 are sufficient, or if that is even what the JPA folks have in mind, it's just an interesting possibility I see.

The other interesting addition to JPA 2.0 is adding a Cache interface to the spec. This is a nod to the implementers (like BEA's Kodo) that all use some type of "second level" cache. The API simply allows you to evict things from the cache, which seems reasonable enough. Cache invalidation is one of the hardest things out there, so it is nice to have explicit APIs for doing this. 

JPA is a topic close to my heart. I started working with Hibernate about six years ago. In some ways there are two philosophies out there when it comes to web-scale systems. One school of thought rejects relational databases. Take a look at Google's Big Table, Amazon's SimpleDB, or Ning's Content Store for examples of this. The other school embraces the relational database and says that they can be scaled. If you saw the eBay presentation at JavaOne, then you know what school eBay belongs to. I think that other RDBMS believers include Yahoo and Facebook. Historically ORMs get in the way of that scaling, and JPA is no exception. I'm not quite ready to give up on them yet.

American Idol

Yes another season of American Idol will finally end tonight. I think the show is really showing its age, of course I've thought the same thing for a couple of years now. Last night's "showdown" demonstrated the evil of the show. It is a show powered by the music industry, you know the same guys who like to sue kids for downloading music, etc. Thus there is evil in its core, and it really came out last night.

Before the show, my wife asked me who I thought would win. I told her that I thought David Cook had the edge. He was more original and a better performer. I thought that David Archuleta was probably a better singer, but was so immature and annoying. He is unable to sing anything but slow songs, and I thought that people had started to catch on to this, including the judges. Ah, but I should have realized that evil that lies in the heart of men and known that these were the reasons that David A. was assured of victory.

Amazingly David A. performed three ballads. I thought this was his best tactic, but that surely he would catch a lot of criticism for it. Nope. Instead he was praised for song choice! In other words, he was praised for embracing his limitation. Even worse, when given the choice of any song to perform he recycled a song from earlier in the season.

David A. played things as safe as he possibly could. He took no chances at all. That is fine, but how many times have we heard the judges blast contestants for this? Not this time. Instead he was praised. To make it even worse, his opponent was criticized for not doing the same thing: Simon Cowell told David C. that he screwed up by not recycling a song.

It was all complete hypocrisy that reeked of an agenda. The evil empire is clearly at work here. The actual performances did not matter. The judges completely contradicted everything they've ever said in the past in an effort to promote one contestant over the other. It will surely work as well.

Don't be too upset, though. The evil empire is falling apart. The internet put a stake in their heart a long time ago. Go listen to some NIN or Radiohead on your iPod and laugh at the music industry. One day MBA students will study them as a perfect example of how to ruin a business by being too conservative and afraid of (technological) change. It's the same short sighted, greedy principles that worked against David C. last night that ruined a huge, billion dollar industry.

Sunday, May 18, 2008

How do You Use Generics in Java?

Cedric has fired up some more flames in the never-ending dynamic vs. static language wars. I won't bother with all of the obviously flammable objects in that post, but instead with this quote that I found really surprising:
90% of the Java programmers (including myself) only ever use Generics for Collections.

Is this true? It's not for me, obviously or I wouldn't have been surprised by it. Generics are so useful beyond collections. One of the most commonly implemented interfaces I deal with looks like:

interface Service<R,S> {
   S processRequest(R request);
}

This is a very simple use of generics to add type safety to a common construct. What would this look like without generics? Either everything is an Object, or there are some wrapper interfaces that would have some kind of getData() method that returns ... an Object. It's not type safe but nothing else could be reusable. 

A lot of people think that Java took a wicked turn with the 1.5 release, and generics are probably the biggest reason for this. To me, the key was if you had programmed in C++ before or not. If you had, then generics seemed natural. It was definitely not the same as C++ templates, but the concepts were similar and the syntax very similar. If you had not experienced C++ templates, then generics were terrifying. I would guess that camp probably would agree with Cedric's post, i.e. they only use them for collections.

Of course the next great scary Java feature is closures. The BGGA proposal leverages generics to specify checked Exceptions, such as below:

public static <T,throws E extends Exception>
T withLock(Lock lock, {=>T throws E} block) throws E { ... }

Even scarier? Maybe so... Anyways, I am genuinely curious how other folks use generics. Is it just for collections?

Tuesday, May 13, 2008

Travelers Insurance Rip-Off

Time for a quick rant...

A few months ago, my car was hit. It was parked at work. I actually got an email on my Blackberry from eBay Security grimly stating "contact us as soon as possible about your car." That could only be trouble, and it was. 

The gentleman who hit my car was kind enough to leave a note. He was at eBay on business, and the car was a rental paid by his company. His company had insurance through Travelers Insurance. They contacted me, sent out somebody to assess the damage, and told me to take the car in to the repair shop of my choice. They instructed me to have the repair shop deal with them directly, so no money would come out of my pocket. They also told me I was entitled to a rental car while my car was in the shop.

So I followed their directions. They actually suggested I just go to the local Volkswagen dealership. The local dealership did not do body work, but recommended a body shop to me, Michael J's. So that is what I did.

A couple of weeks later, I get a call from the rental car company that I used while my car was in the shop, Enterprise. They said that Travelers was refusing to pay for the rental car. I contacted Travelers and they claimed that the repairs took longer than they thought they should, so they would not pay for the rental car. 

So in recap, my car gets hit in the parking lot, and I wind up being out $$$ thanks to Travelers Insurance. If they were my insurance company, I would simply drop them. But they aren't. So at the very least, I'm ranting about it on my blog. That's what blogs are for, right?

New Music 2008

There has been some pretty good music to come out this year. Here is what I have been listening to.

Attack & Release by The Black Keys : This album was an immediate hit for me. It is hard not to draw comparisons to The White Stripes. White Stripes + Black Keys = piano keyboard? This is a great blues/rock album. Favorite tracks "I Got Mine" and "Strange Times."

The Seldom Seen Kid by Elbow : I really did not like this album at first. It really took several listens for it to grow on me. I think part of the problem is that I really do not like the first song "Starlings." The album is really quite good besides that song. In particular I like "Grounds for Divorce."

Accelerate by R.E.M. : Wow, R.E.M. is still going! I bought every R.E.M. album when they were released, starting with Document in 1987 and ending with Reveal in 2001. That's eleven years and eight albums. Reveal was not good. I did not buy Around the Sun in 2004. I was quite skeptical when I heard about the release of Accelerate, but it is a good album. I really like "Living Well is the Best Revenge" and "Accelerate". 

Consolers of the Lonely by The Raconteurs : I love the White Stripes, but I really did not like the first album The Raconteurs. It was underwhelming. This one is much better. I still sometimes wish that Jack White would just take over and turn it into a White Stripes album, but it's usually a "this is good but ohh, how it could be better" kind of thing. Favorite songs are the title track and "Old Enough."

Vampire Weekend by Vampire Weekend : This album took awhile to grow on me, too. It was one of those albums that sounded much better in my car than on headphones. Pitchfork says they are pop not rock. I don't agree. I think they are a little too clever to for pop, and maybe too catchy for "alternative"? Who knows. It is a fun and original album. Favorite songs "Campus" and "Oxford Comma."

The Slip by Nine Inch Nails : Oh, NIN released an album? Well actually they released Ghosts as well, but I must admit that I haven't listened to that much since I downloaded it. The Slip is amazingly good. The only bad thing is that by releasing so much music, Trent Reznor is making it seem too easy. For me, this album really clearly defined the post-Fragile NIN sound. It obviously started with With Teeth, but The Slip has made it crystal clear. Best tracks "1,000,000", "Discipline", and "The Four of Us are Dying."

Friday, May 09, 2008

Flex at JavaOne

Earlier today I attended a talk that introduced Flex and showed how it could be used with Java. It really didn't work all that well, mostly because there was too much information. I think it was too much for people to digest. 

Currently I'm watching talk where they are building an app with Flex, JSF, and WPF (Silverlight?) The speaker is developing live on stage. This seems dangerous, but it is going quite well. It is barely tapping the potential of Flex, but is very convincing. An interesting comment was that Flash is limited because it cannot do any 3D. Of course this comment is coming from a NVDIA engineer, but it is a valid point. 

It is very nice to have so many non-Java related talks at JavaOne. The content of JavaOne has really aligned well with the interests of the attendees. I think next year it will be even more so. I predict that next year the dynamic languages will take over and command the biggest audiences. 

JavaOne 2008

I've been at JavaOne all week, but not blogging. That is partially because of the horrible Wi-Fi at JavaOne this year. Luckily it is good today, so I am blogging before Josh Bloch's Effective Java talk. I also just picked up the new edition of Effective Java, as if I could somehow read the whole thing before the talk ... Probably should have waited to buy it at the end of the day since I won't be reading it until the train ride home. Anyways...

So what's been good at JavaOne this year? Well ... no big news, really. We see more meat on JavaFX and I guess that was the lead story at the opening keynote. It seems like JFX is about where Silverlight was a year ago. The JVM 6, update 10 does rock. You can do so much more with Java than with Flash or Silverlight, regardless of JFX, so it will be interesting to see if people leverage that.

Along those lines, the most interesting session I saw this week was by the guys from Ajaxian, Dion Almaer and Ben Galbraith. Their talk was titled "What's new in Ajax." The subject matter was interesting, but that's not what made their talk so awesome. They are outstanding speakers, and they seem to have a great rapport between them. Their talk seemed very conversational. It also definitely stood out for NOT having to follow the Sun presentation template. It was a Keynote presentation full of cool graphics, animations, and even some short video interviews with some of the top dogs in the world of Ajax frameworks. 

They also did two cool demos, one with Fluid. I fee like such a nov for not having seen this before. It is awesome. They also did an awesome demo where they "threw" a dart at a dart board. The dart board was an Ajax app running the in the browser. The dart was a Wiimote! They used the Bluetooth connection of the Wiimote to send to the PC, and then bridged the Bluetooth stack on the PC to the browser using (drum roll please) a Java applet. The applet was just for communicating with the OS, all of the graphics were in HTML and the interactivity was JavaScript. Very freakin' cool. 

The other good session I went to was Alex Miller's talk on design patterns. Most of this was stuff I was familiar with, like problems with the Singleton and Visitor patterns. What was interesting to me was that Alex showed how closures could dramatically change the implementation of some of these patterns. He showed this for a template pattern. Both template and strategy patterns are kind of obvious targets for refactoring with closures. What surprised me was how a visitor pattern could be refactored using closures. I was inspired to start playing with design patterns in Scala, since that is the future of Java (in my opinion.) I tweeted this to Alex, and he pointed me to some writings on that exact topic. I still plan on doing my own exposition, as I think it will be fun.

Tuesday, May 06, 2008

Dynamic Language Performance

A couple of days ago, I read Charlie's post explaining the performance boost seen in Groovy 1.6. Reading stuff like this always leaves me with a great feeling. Not only do you learn something, but it makes other things make more sense. It brings order to chaos, or something like that. Around the same time I read that, I was working a new article about Grails, so the Groovy angle was particularly interesting. I love benchmarks, so it was time to have some fun.

I wrote a Groovy version of the same Ruby code I had used to benchmark JRuby. This was an extremely straightforward port. I was amazed at just how similar Groovy's syntax is to Ruby. Here is the code:


def expo(n,p){
def r = n % p
def exp = 0
def div = p
while (r == 0){
exp += 1
div *= p
r = n % div
}
return exp
}

def factor(n){
def factors = new java.util.HashMap<Integer,Integer>()
def s = n * 0.5
def p = (2..s).toArray()
p.each{
if (it) {
def r = expo(n,it)
if (r){
factors[it] = r
}
def val = it*2
while (val <= s){
p[val -2] = null
val += it
}
}
}
return factors
}

def numDivisors(n){
def total = 1
factor(n).values().each{
total *= (it+1)
}
return total
}

def n = 2
def num = 1
def max = Integer.parseInt(this.args[0])
def Integer triangle = 0
while (num <= max){
triangle = n*(n+1) * 0.5
num = numDivisors(triangle)
n += 1
}
println(triangle)


Anyways, here is the chart.

There is definitely a performance boost for long running processes where JIT'ing can happen more easily in 1.6. It was not as dramatic as I thought it might be, but it is there. Of course this is just one silly benchmark that is heavy in integer math, so take that for what it's worth. 

I also compared Groovy and JRuby. This was also surprising: 



Pretty close! Groovy seems to start-up a little slower, but pulled ahead slightly on bigger tasks. Perhaps the apprentice has overtaken the master.

Also, just for kicks, I tried out Scala. Here is the code:


import scala.collection.mutable._

object Euler12{
def expo(n:int, p:int):int = {
var r = n % p
var exp = 0
var div = p
while (r == 0){
exp = exp + 1
r = n % div
div = div * p
}
if (exp == 0 ) 0 else (exp-1)
}

def factor(n:int):Map[int,int] = {
var factors = new HashMap[int,int]()
var s:int = (n/2) + 1
val p = (2 until s).toArray
p.foreach( (num) => {
if (num > 1){
val r = expo(n, num)
if (r > 0){
factors.put(num, r)
}
var i = num*2
while ((i-2) < p.length){
p(i - 2) = 0
i = i + num
}
}
})
return factors
}

def numDivisors(n:int):int = {
var total = 1
factor(n).values.foreach((num) => {
total = total * (num+1)
})
factor(n).values.foldLeft(1)((p,m) => {
p * (m+1)
})
}

def main(args:Array[String]) : Unit = {
val t = new java.util.Date()
var n = 1
var num = 1
val max = Integer.parseInt(args(0))
var triangle = 3
while (num <= max){
triangle = n*(n+1)/2
num = numDivisors(triangle)
n = n + 1
}
println(triangle)
}
}


This turned out to not be fair. Scala's performance is exactly on par with Java and thus blows away JRuby and Groovy. 


I guess that is what happens when you have a language written by a guy who once wrote javac... Actually I would guess this is mostly a function of the static typing in Scala. It certainly bodes well for initiatives to bring features of Scala, like (BGGA-style) closures and type inference, to Java. It seems possible to implement all of this with no impact on performance, even on a JVM that has not been made to support such features. 

Saturday, May 03, 2008

Twitter Me This

No Twitter running off the Rails discussion tonight. One reason I write about Twitter is because I really value the service.  It was particularly useful to me today.

I took my oldest son, Michael, Jr. to Maker Faire today. We left right after lunch. I set Twitter to deliver messages via IM, which for me means Google Talk. I have a Google Talk client on my Blackberry, so all updates went to my phone via IM. I did a "track #makerfaire". Just as I was about to hit the road I see tweet saying how bad traffic was on the 101 near San Mateo, where Maker Faire takes place. I also see a tweet saying the best way to avoid the street traffic was to take the Hillsdale Blvd. exit to Saratoga Drive, where there is free parking. These were not tweets from people I follow, but from people going to Maker Faire, and they were right on. So I took 280 to 92 instead of 101, and used the Hillsdale Blvd tip to find free parking. I got to see a parking lot on the freeway near the San Mateo fairgrounds, as well as on Delaware Avenue (where most people got off the freeway) getting onto Saratoga Drive. I did not have to deal with any of that traffic. Thank you Twitter!

Not long after I got home from Maker Faire, I checked Twitter and saw the first mention of Microsoft withdrawing their bid for Yahoo. I had turned off my Blackberry setup, but immediately change my settings back to IM and turned on iChat on my MacBook. I did this so I could track Yahoo and Microsoft on Twitter. All I can say is ... wow. It was amazing to watch the collective conscious of ... well at least Silicon Valley ... react to such surprising news. Now I'm not going to exaggerate, most of the tweets were redundant and few had any particular insight. That is not the point. Crowd sourcing may be great for traffic info, but not business and technology analysis (just ask a communist survivor!) But it is fun to see how some people were relieved, while others were disappointed because they knew that YHOO stock was doomed to plummet on Monday. 

EclipseDay

I will be speaking at EclipseDay next month at Google. I will be talking about how we use Eclipse at eBay. I am going to try to demo and show off eBay's highly customized Eclipse-based development environment. Of course anytime you do a demo, you are at the mercy of the demo-gods! Hopefully they will be merciful.

Friday, May 02, 2008

As the Bird Turns



Another week, another Twitter outage ... and a new round of technology questions and rumors. TechCrunch now thinks that Twitter is abandoning Rails. This time out, Arrington attempts to be a little more fair-and-balanced than when he wrote about Blaine leaving. He points other sites that claim to have scaled Rails. This was of particular interest to me, so let's take a look.

Scribd -- Slide 7 claims three databases! Uh oh, is DHH right and I'm going to have to eat crow? Well maybe, but not because of Scribd. They only use master-slave relationship, but cleverly offload expensive queries to the slaves. Still only one place to write data. When (if) their data gets too big for a single database, they are going to be the ones singing "bring that beat back!"

Friends for Sale -- I could write a lot just about these guys and how ... umm ... interesting their setup is. I'll just quote them: "The most important thing we learned is that your scalability problems is pretty much always, always, always the database" but "on the database side we're still with a monolithic master and we're trying to push off sharding for as long as we can." They still have no problem claiming that "The whole 'but does Rails scale?' discussion sounds like a bunch of masturbation - the point is moot." You can't make up stuff like this!