Tuesday, October 27, 2009

Minimalism? Programming? Huh?

Last week, I was at the Strange Loop conference, and it was a great conference. It inspired one blog post that managed to piss some Clojure people off. So here is another. This time I'm going after the closing keynote, Alex Payne. Here are his slides, including detailed notes.
So where to begin... Even though I am going to harsh on this, I will give it very high marks. Why? Because it was thought provoking. If something makes you think long and hard about how and why you do your job, that is a good thing. I worry about people who do not question their assumptions on a regular basis.
Ok, so now on to what you've been waiting for -- the harshness. I hate software engineering analogies. Well, not all, but most. I hate when people try to romanticize about what they do by making a far-fetched analogy to something more glamorous. If you're a programmer, you're not a rock star. You're not an artist. You're not a musician. You're not even a scientist. Sorry. Despite having the word "architect" in my official job title, I would add that you're not an architect either.


You're a programmer. At best, you're an engineer, and that is really stretching it at times. So obviously if I find it ridiculous to compare programming to things like creating art or making music, it seems even more ridiculous to compare the output of programming to art or music.

That being said, I enjoy programming. It is not just a job. I also take pride in my work. I like to show off my code as much as the next guy.

From this perspective, I can definitely appreciate Alex's thoughts on minimalism. Or perhaps more generally, associating some subjective qualities with programming. If an analogy to art or construction or whatever helps one express those subjective qualities, then fine. Just don't forget who you are, and maybe even try to find some pride in just that.

Now I should really end this post now, on kind of a feel good note. But I won't. Instead, I want to touch on some of the *ahem* deeper thoughts that Alex's talk provoked. I was reminded of a paper I read several years ago that tried to explain two obviously overly general (and probably not politically correct) things: why Americans are good at software but not cars, while the Japanese are good at cars, but not software. I tried to find a link to this paper, and was quite sure that I had saved it in Delicious. However I could not find, and I'm convinced that Delicious lost it. Maybe it fell out of cache...

Anyways, the crux of the argument is that Americans are good at getting something done quickly, even though the quality may be poor. So we are good innovators, but terrible craftsmen. This is a good fit for software, but obviously not cars.

If you accept this idea, and it has its merits, then in a world of low quality, rapidly written code, is there any room for subjective qualities? Why should you take any pride in your work at all if its value is directly proportional to how quickly you can produce it, throw it away, and move on to something else? To some degree, doesn't the software world's affection with agile development codify these almost nihilistic ideals? Perhaps the propensity to try and compare programming to art is simply an act of desperation caused by the realization that you'd be better off giving up on good design and instead you should invest in duct-tape.

Anyways, this is a close-to-home topic for me. My day job often involves telling developers the "right" way to do something, and the more unsavory flip-side of this, telling developers when they've done things the "wrong" way. You don't make a lot of friends, but I'm an INTJ so it works for me. Every once in awhile, I work with a programmer who writes beautiful code. Yeah, I said it. Any such programmer knows that they write good code, even if they don't like to talk about it. When you write good code, you constantly see bad code around you, and you can't help but get a big ego. I always compliment the good code, and it always surprises the arrogant pricks. They think I'm blown away and have never seen such good code. In reality, I just feel sorry for them.

Sunday, October 25, 2009

Social Technology Fail

This is the kind of posting that needs a disclaimer. I'm going to talk a little about recent changes at Facebook and Twitter, but strictly from a technology perspective. It goes without saying that I have no idea what I'm talking about. I am fortunate enough to be acquaintances with several engineers at both companies, and I have a college classmate (and fellow Pageboy) who seems to be a pretty important dude at Facebook, but I have no extra knowledge of these companies' technology than anybody else. So just to repeat: I have no idea what I'm talking about. You should really stop reading.

Since you are still reading, then I will assume that you too enjoy being an armchair architect. Since my day job is as an architect at eBay, I tell myself that exercises such as this make me better at my job. Heh heh. Let's start with Facebook.

For several months now, I've noticed an interesting phenomenon at Facebook. My news feed would often have big gaps in it. I have about 200 friends on Facebook, and I'd say that around 70% of these friends are active, and probably 20-25% are very active Facebook users. So at any time I could look at my feed, and there would be dozens of posts per hour. However, if I scrolled back around 3-4 hours, I would usually find a gap of say 4-6 hours of no posts. The first time I ever noticed this, it was in the morning. So I thought that this gap must have been normal -- people were asleep. Indeed, most of my friends are in the United States. However, I started noticing this more and more often, and not always in the morning. It could be the middle of the day or late at night, and I would still see the same thing: big gaps. So what was going on?

Well here's where the "I don't know what I'm talking about" becomes important. Facebook has been very happy to talk about their architecture, so that has given me speculation ammo. It is well known that Facebook has probably the biggest memcached installation in the world, with many terabytes of RAM dedicated to caching. Facebook has written about how they have even used memcached as a way to synchronize databases. It sure sounds a lot like memcached has evolved into something of a write-through cache. When you post something to Facebook, the web application that you interact with only sends your post to the cache.

Now obviously reads are coming from cache, that's usually the primary use case for memcached. Now I don't know if the web app can read from either memcached and a data store (either a MySQL DB, or maybe Cassandra?) or if Facebook has gone for transparency here too, and augmented memcached to have read-through cache semantics as well. Here's where I am going to speculate wildly. If you sent all your writes to a cache, would you ever try to read from anything other than the cache? I mean, it would be nice to only be aware of the cache -- both from a code complexity perspective and from a performance perspective as well. It sure seems like this is the route that Facebook has taken. The problem is that not all of your data can fit in cache, even when your cache is multiple terabytes in size. Even if your cache was highly normalized data (which would be an interesting setup, to say the least) a huge site like Facebook is not going to squeeze all of their data into RAM. So if your "system of record" is something that cannot fit all of your data... inevitably some data will be effectively "lost." News feed gaps anyone?

Maybe this would just be another useless musing -- an oddity that I noticed that maybe few other people would notice, along with a harebrained explanation. However, just this week Facebook got a lot of attention for their latest "redesign" of their home application. Now we have the News Feed vs. the Live Feed. The News Feed is supposed to be the most relevant posts, i.e. incomplete by design. Now again, if your app can only access cache, and you can't store all of your data in cache, what do you do? Try to put the most "relevant" data in cache, i.e. pick the best data to keep in there. Hence the new News Feed. The fact that a lot of users have complained about this isn't that big of a deal. When you have a very popular application, any changes you make are going to upset a lot of people. However, you have to wonder if this time they are making a change not because they think it improves their product and will benefit users overall, but if instead it is a consequence of technology decisions. Insert cart before horse reference here...

Facebook has a great (and well deserved) reputation in the technology world. I'm probably nuts for calling them out. A much easier target for criticism is Twitter. I was lucky enough to be part of their beta for lists. Now lists are a great idea, in my opinion. Lots of people have written about this. However, the implementation has been lacking to say the least. Here is a very typical attempt to use this feature, as seen through the eyes of Firebug:

It took my five attempts to add a user to a list. Like I said, this has been very typical in my experience. I've probably added 100+ users to lists, so I've got the data points to back up my statement. What the hell is going on? Let's look at one of these errors:

Ah, a 503 Service Unavailable response... So it's a temporary problem. In fact look at the response body:

I love the HTML tab in Firebug... So this is the classic fail whale response. However, I'm only getting this on list requests. Well, at the very least I'm only consistently getting this on list requests. If the main Twitter site was giving users the fail whale at an 80% clip... In this case, I can't say exactly what is going. I could try to make something up (experiments with non-relational database?)
However, this is much more disturbing to me than what's going on at Facebook. I don't get how you can release a feature, even in beta, that is this buggy. Since its release, Twitter has reported a jump in errors. I will speculate and say that this is related to lists. It would not be surprising for a feature having this many errors to spill over and affect other features. If your app server is taking 6-10 seconds to send back (error) responses, then your app server is going to be able to handle a lot less requests overall. So not only is this feature buggy, but maybe it is making the whole site buggier.
Now, I know what we (eBay) would do if this was happening: We'd wire-off the feature, i.e. disable it until we had fixed what was going wrong. Twitter on the other hand...

Huh? You've got a very buggy feature, so you're going to roll it out to more users? This just boggles my mind. I cannot come up with a rationale for something like this. I guess we can assume that Twitter has the problem figured out -- they just haven't been able to release the fix for whatever reason. Even if that was the case, shouldn't you roll out the fix and make sure that it works and nothing else pops up before increasing usage? Like I said, I just can't figure this one out...

Thursday, October 22, 2009

Concurrency Patterns: Java, Scala, and Clojure

Today I was at The Strange Loop conference. The first talk I attended was by Dean Wampler on functional programming in Ruby. Dean brought up the Actor concurrency model and how it could be done in Ruby. Of course I am quite familiar with this model in Scala, though it is copied from Erlang (which copied it from some other source I'm sure.) The next talk I went to was on software transactional memory and how it is implemented in Clojure. I had read about Clojure's STM in Stuart Halloway's book, but I must admit that it didn't completely sink in at the time. I only understood the basic idea (it's like a database transaction, but with retries instead of rollbacks!) and the syntax. As I've become more comfortable with Clojure in general, the idea has made more and more sense to me.

Now of course, no one size fits all. There are advantages and drawbacks to any concurrency model. A picture of these pros and cons kind of formed in my head today during the STM talk. I turned them into pictures. Here is the first one:

This is meant to be a measure of how easy it is to write correct concurrent code in Java, Scala, and Clojure. It is also meant to measure how easy it is to undersand somebody else's code. I think these two things are highly correlated. I chose to use these various languages, though obviously this is somewhat unfair. It wold be more accurate to say "locking/semaphores" instead of Java, the Actor model of Scala, and software transactional memory instead of Clojure -- but you get the point. So what does this graph mean?
Well obviously, I think Java is the most difficult language to write correct code. What may be surprising to some people is that I think Clojure is only a little simpler. To write correct code in Clojure, you have to figure out what things need to be protected by a dosync macro, and make sure those things are declared as refs. I think that would be an easy thing to screw up. It's still easier than Java, where you have to basically figure out the same things, but you must also worry about multiple lock objects, lock sequencing, etc. In Clojure you have to figure out what has to be protected, but you don't have to figure out how to protect it -- the language features take care of that.
So Clojure and Java are similar in difficulty, but what about Scala and the Actor model? I think this is much easier to understand. There are no locks/transactions. The only hard part is making sure that you don't send the same mutable object to different actors. This is somewhat similar to figuring what to protect in Clojure, but it's simpler. You usually use immutable case classes for the messages sent between actors, but these are used all over the place in Scala. It's not some special language feature that is only used for concurrency. Ok, enough about easy to write/understand code, there are other important factors, such as efficiency:

Perhaps this should really be described as memory efficiency. In this case Java and the locking model is the most efficient. There is only copy of anything in such a system, as that master copy is always appropriately protected by locks. Scala, on the other hand, is far less efficient. If you send around messages between actors, they need to be immutable, which means a lot of copies of data. Clojure does some clever things around copies of data, making it more efficient than Scala. Some of this is lost by the overhead of STM, but it still has a definite advantage over Scala. Like in many systems, there is a tradeoff between memory and speed:

Scala is the clear king of speed. Actors are more lightweight than threads, and a shared nothing approach means no locking, so concurrency can be maximized. The Java vs. Clojure speed is not as clear. Under high write contention, Clojure is definitely slower than Java. The more concurrency there is, the more retries that are going on. However, there is no locking and this really makes a big deal if there are a lot more reads than writes, which is a common characteristic of concurrent systems. So I could definitely imagine scenarios where the higher concurrency of Clojure makes it faster than Java. Finally, let's look at reusability.

By reusability, I mean is how reusable (composable) is a piece of concurrent code that you write with each of these languages/paradigms? In the case of Java, it is almost never reusable unless it is completely encapsulated. In other words, if your component/system has any state that it will share with another component, then it will not be reusable. You will have to manually reorder locks, extend synchronization blocks, etc. Clojure is the clear winner in this arena. The absence of locks and automatic optimistic locking really shine here. Scala is a mixed bag. On one hand, the Actor model is very reusable. Any new component can just send the Actor a message. Of course you don't know if it will respond to the message or not, and that is a problem. The bigger problem is the lack of atomicity. If one Actor needs to send messages to two other Actors, there are no easy ways to guarantee correctness.
Back in June, I heard Martin Odersky says that he wants to add STM to Scala. I really think that this will be interesting. While I don't think STM is always the right solution, I think the bigger obstacle for adoption (on the Java platform) is the Clojure language itself. It's a big leap to ask people to give up on objects, and that is exactly what Clojure requires you to do. I think a Scala STM could be very attractive...

Tuesday, October 20, 2009

Don't Dream It's Over



Sometimes you need some 80's music. While you listen to that song, I want you to think about something: the Opera browser. As an OG geek, I used to use Opera -- I even paid for it. It was so much better than IE and that was back in the day when there was just the Mozilla Suite Monster, no Firefox. Sure, there were sites that didn't work well in it, or that actively discriminated against it (including my current employer...) That was ok. It was so much faster than anything else out there, that it didn't matter.

As the world has turned over the years, history has proved Opera right. How fast your browser is does matter. How secure your browser is does matter. There is plenty of room for innovation in the browser space: tabs, download managers, speed dial, magic wand, etc. So many of Opera's ideas have been taken by the Firefoxes, Safaris, and most lately, the Chromes of the world, and touted as innovations -- that were then copied by IE. Meanwhile, what's happened to Opera? It doesn't always pay to be right.

I got news for you, Opera is still kicking butt. There has been a renewed focus on browser technology, and in particular what is collectively known as HTML 5. Apple, Google, Mozilla, and even Microsoft all like to talk about how awesomely they implement the HTML 5 specifications. Turns out they are still way behind Opera. Don't believe me? Take a look at @ppk's HTML 5 browser comparison. Or perhaps you are more visual...

Code (courtesy of a colleague):
<form> 
    <datalist id="mylist"> 
        <option label="Mr" value="Mr"> 
        <option label="Ms" value="Ms"> 
        <option label="Professor"value="Prof"> 
    </datalist> 
    <div class="entry"> 
        <label for="form-1">Name (required) </label> 
        <input id="form-1" name="name" type="text" autofocus required> 
        ← autofocus here </div> 
    <div class="entry"> 
        <label for="form-2">Title</label> 
        <input id="form-2" name="title" list="mylist" type="text"> 
    </div> 
    <div class="entry"> 
        <label for="form-4">Age</label> 
        <input id="form-4" name="age" type="number" min="18" max="25"> 
    </div> 
    <div class="entry"> 
        <label for="form-5">Email (required)</label> 
        <input id="form-5" name="email" type="email" required> 
    </div> 
    <div class="entry"> 
        <label for="form-6">Blogs</label> 
        <input id="form-6" name="url" type="url"> 
    </div> 
    <div class="entry"> 
        <label for="form-7">Date of Birth</label> 
        <input id="form-7" name="dob" type="date"> 
    </div> 
    <div class="entry"> 
        <label for="form-8">Attractiveness </label> 
        <input id="form-8" name="a" type="range" step="0.5" min="1" max="10" value="5"> 
        <output name="result" onforminput="value=a.value">5</output> 
    </div> 
    <div class="button"> 
        <button type=submit>Submit</button> 
    </div> 
</form>
Opera:


Latest Chromium Nightly:

As you can see, Opera does a great job of implementing many of the new markups in HTML 5, while Chrome .... not so much. Big deal, right? You can do all of these things with JavaScript you say. Yeah, but not only will that be a lot of heavy JS, but it will lose semantics. Don't think that matters? Well, not only do those semantics matter to folks who use screen readers, but it also matters to the most important user of the Internet: googlebot.

Anyways, the point is that once again Opera is leading the way, not the folks who talk the loudest about pushing browser technology.

Note: This blog post created with Opera.

Monday, October 19, 2009

HTML 5 Features on Mobile Browsers

Earlier today I looked for some help from the smarty folks that I follow on Twitter:


Both @sophistifunk and my colleague @rragan responded that I should look at @ppk's HTML 5 compatibility table and WebKit comparison table. Now these are great resources, and I already had an appreciation for them. However, even put together, they do not, in my opinion, accurately describe the state of mobile web browser capabilities.

Let me take a step back. First it should be said that I hate web standards. Seriously. As someone who has spent much of his adult life developing web applications, I understand why people like standards. Having to write code that is specific for each browser will drive you insane. I don't want to do that anymore than anyone else. However, standards are ex post facto so to speak. It is great to take things and create standards around them, but it is a losing battle to look to standards for innovation. If we had done that, there would be no such thing as Ajax -- or it could only be implemented using hidden IFrames...

With that in mind, I have never given much faith to HTML 5 as a standards lead revolution. This is one of the flaws in the compatibility/comparison tables. Let me give you a couple of examples. The WebKit comparison table shows that neither Android 1.0 or 1.5 supports geolocation (I know, I know, geolocation is a separate specification, but practically it is lumped in with HTML 5). Anybody with an Android phone knows that this is patently false. The Android browser defaults to Google's mobile home page, which uses the geolocation API to echo your location to you -- if you browser supports geolocation. So did @ppk just get this wrong? No, not really. The catch is that the Android browser is not just WebKit, it is also Gears, which implements the HTML 5 geolocation completely. So anybody using an Android phone (and there's going to be a lot of such folks very soon), has a geolocation-enabled browser. But you would not know this from reading the WebKit table.

Another example is app cache. This is specification meant to enable offline applications. Again the WebKit table says that it is not supported in any Android browser. However, this functionality is once again implemented using Gears. In this case, I don't think it follows the specification exactly. Oh well.

Oh, and one last nitpick... The HTML 5 comparison does not even mention database APIs. This is supported by the latest versions of the browsers in the iPhone and Android (and it has been for 1yr+ in both I think.) Yeah I know, @ppk can only run so many tests, and that will probably be listed in the future...

I'm really not trying to diss @ppk and the data provided on quirksmode. It is tricky to assess mobile browser capabilities. That's why I was trying to be lazy in the first place and hope that somebody had done all of the hard work for me!

Sunday, October 18, 2009

The IntelliJ IDEA Bomb

In case you missed the announcement from JetBrains yesterday, the popular IntelliJ IDEA has gone free and open source. Sort of. There is now a community edition, that is FOSS, and an enterprise edition that you must pay for. Why is this a big deal? Read on.

IntelliJ really revolutionized Java development. It wasn't the first IDE to allow for code completion, or even the first Java IDE to do this. However, it was definitely a pioneer in code refactoring. And this was huge. It took many of the code improvement ideas catalogued by Martin Fowler, and turned them into simple commands that any programmer could use. This also allowed for things like code navigation, where you could go from the usage of a class or a method to the implementation (or at least the declaration, if the usage only referred to the interface.) This was only the beginning. IntelliJ was a pioneer of bringing in the Java ecosystem. I remember how much easier IntelliJ made it to use Struts, for example.

This reminds me of my own personal use of IntelliJ over the years. When I first started working in enterprise Java development, I was working for a startup that built EJB applications targeted for WebLogic and WebSphere. For WebSphere, we used Visual Age for Java. For WebLogic, we just used Kawa (a bare bones editor.) I hated Visual Age with a passion. It scarred me badly. For years afterwards, I stuck to simple text editors.

Several year later (2003), I joined a fresh startup, KeepMedia (later renamed to MyWire). Prior to KeepMedia, I had a short stint with a .NET startup, Iteration Software (later renamed to Istante, and sold to Oracle.) There I used Visual Studio quite a bit, and appreciated the productivity gains it provided. So at KeepMedia, I took a look at various Java IDEs: JBuilder, JDeveloper, and IntelliJ. I was one of three programmers at KeepMedia, and one of my colleagues was really in love with Struts. He worked on the front of KeepMedia, but I worked mostly on our back end system that integrated magazine content from publishers. I didn't have to use Struts for that obviously, but I did occasionally help with the front end (we had three programmers after all.) I hated Struts, but IntelliJ made it more tolerable. The only negative about it was that it was flaky on Linux, but pretty much everything was flaky on Linux back then.

After I worked at KeepMedia, I worked at a consulting company, LavaStorm Engineering (2004). I had been using IntelliJ for a long time, and was convinced that everything else was crap. One of my colleagues introduced me to Eclipse. I was horrified. Most of Eclipse's codebase came from an all-Java rewrite of Visual Age (the version I had used was written in Smalltalk.) So even though it shared no code with the beast that I had hated, it shared a lot of the look-and-feel. For example, take a look at the outline view in Eclipse (any version, even the most recent.) This was completely taken from Visual Age. Even the icons are exactly the same. There was no way I was going to give up my IntelliJ for that thing...

A couple of years later, I was at another startup, Sharefare (which later changed its name to Ludi Labs.) Being a startup, there was no standardization arounds tools. However, everyone used Eclipse. I had warmed up to Eclipse (though I still preferred IntelliJ), and I had started writing Eclipse related articles for IBM. So I went with Eclipse there, as there were advantages to everyone using the same IDE (being able to share .project/.classpath, plugins, etc.)

After Ludi Labs went down in early 2007, I joined eBay. To say we're a major Eclipse shop, would be putting it mildly...
We're not the only major Java shop that has invested heavily in Eclipse plugins. If you're doing GWT, App Engine, or Android development, then you're probably using Eclipse too. I know I am. I didn't really start using IntelliJ again, until the past year. The reason was simple: Scala. It is the best IDE for doing Scala development currently. Actually IntelliJ's Scala offering vs. the competition, is similar to anything else. It's not that it necessarily has a lot more features, but it has similar and most importantly, it is of higher quality. Here's a great visualization of this key difference:

This is why the open sourcing of IntelliJ is important. It's not that we didn't already have great open source IDEs for Java. No, it's more about what does this mean for IntelliJ. Will the attention to detail go down hill? Will the stable plugin ecosystem be disrupted?

Of course, for most current and would-be users of IntelliJ, the availability of a free IntelliJ is really not big news. The free version is really quite limited. You're not going to build web apps, or apps that connect to databases, use web services, etc. Not with the community version, unless in-the-wild plugins became available that enable this. IntelliJ has awesome support for all of these things, but those parts of IntelliJ remain behind a price wall.

There is one notable exception here: Scala. Right now, IntelliJ has the best Scala support. It has more features and less bugs than NetBeans, and it is much more stable than Eclipse. Scala support is one of the features supported in the community edition. That means a lot of newbie Scala developers can just use IntelliJ. This is great news. I would not give NetBeans or Eclipse to Scala newbies, for the simple reason that both of them report syntax errors inconsistently. In other words, both are guilty of either not identifying an error, or identifying a false error (or both.) That's anathema for somebody learning a language. I'm able to put up with it, simply because I know just enough Scala to say "you're wrong IDE" at times. I've rarely seen this happen with IntelliJ.

Tuesday, October 13, 2009

Scala Manifests FTW

I came across a piece of code written by a colleague. It was a flexible XML/JSON parser. It would turn an XML or JSON structure into a map. The keys were strings. The values were either strings, lists, or maps. The lists could be lists of strings, lists, or maps. The maps had strings as keys and value as (wait for it) strings, lists, or maps. We had run across a bug recently. Usually a particular web service returned data that looked something like:
{ "details" : { "a" : "x", "b" : "y" } }
So we had code that looked like :
val response = // code that called the parser
val foo = response("details").asInstanceOf[Map[String,String]]("a")
However, one day we got some bad data:
{ "details" : "" }
So of course the earlier code blew up. I wanted to have something like this:
trait SafeMapTrait {
    def getString(key:String):String
    def getList(key:String):List[AnyRef]
    def getMap(key:String):Map[String, AnyRef]
}
Now this can be accomplished pretty easily:
class EasySafeMap(val map:Map[String, AnyRef]){
  def getString(key:String):String = {
    if (map.contains(key)){
      if (map(key).isInstanceOf[String]) map(key).asInstanceOf[String] else null
    } else null
  }
  // etc.    
}
There would be similar methods for lists and maps. I didn't like this, and thought I should be able to do better. Looking at the final solution, I'm not sure that I did. But I did learn some things about Scala Manifests... Before we get there, let's look at me first naive attempt to do better:
class NotSoSafeMap(val map:Map[String,AnyRef]){

  def getString(key:String):String = getType(key)
  def getList(key:String):List[AnyRef] = getType(key)
  def getMap(key:String):Map[String,AnyRef] = getType(key)

  private def getType[T](key:String):T  = {
    val value = map.getOrElse(key, null)
    if (value != null && value.isInstanceOf[T]) value.asInstanceOf[T] else null
  }

}
That would have been, huh? I really wanted to use a parameterized method for the extraction, comparison, casting. The problem with this is that there is no way to know the type T. You could explicitly add the parameter, i.e. getType[String](key) but it doesn't help because of erasure. I tried this instead:
class NotSoSafeMap(val map:Map[String,AnyRef]){

  def getString(key:String):String = getType(key,null)
  def getList(key:String):List[AnyRef] = getType(key,null)
  def getMap(key:String):Map[String,AnyRef] = getType(key,null)

  private def getType[T](key:String, default:T):T  = {
    val value = map.getOrElse(key, default)
    if (value.isInstanceOf[T]) value.asInstanceOf[T] else default
  }

}
I thought that this might be better because of the type information being given in the default value. This didn't work. Using the null default seemed dumb, but even adding defaults like the empty string, an empty list, etc. did not help. Erasure was once again kicking my ass. So it was time to learn about Manifests.
I had heard Jorge Ortiz talk about manifests previously. He has also written an excellent blog post about them. He told me that these were still "experimental" (i.e. undocumented) in Scala 2.7.x, but were officially part of the upcoming 2.8 release. Sounded good to me. Here is the solution I came up with:
class SafeMap(val map:Map[String,AnyRef]){
  import scala.reflect.Manifest

  def getString(key:String):String = getType[String](key) match {
    case Some(s:String) => s
    case _ => null
  }

  def getMap(key:String):Map[String, AnyRef] = getType[Map[String,AnyRef]](key) match {
    case Some(m:Map[String, AnyRef]) => m
    case _ => null
  }

  def getList(key:String):List[AnyRef] = getType[List[AnyRef]](key) match {
    case Some(list:List[AnyRef]) => list
    case _ => null
  }

  private def getType[T](key:String)(implicit m:Manifest[T]):Option[T] = {
    map.getOrElse(key, null) match {
      case a:AnyRef => if (m >:>  Manifest.classType(a.getClass)) Some(a.asInstanceOf[T]) else None
      case null => None
    }
  }
}
Ok, a few things to note here. First the local import of scala.reflect.Manifest. Again it's not a documented class, but it's in there. Now my getType method. Notice that it uses the function_name (param:type) (param:type) syntax. Also notice the implicit Manifest parameter. The callers don't add this, the compiler adds it for you. Next notice that it returns an Option class. I wanted it to just return T. However, I could not have a case where it returned null if T was the declared return type of the method. So I went with Option. Finally, notice the Manifest magic. That's the m >:> Manifest.classType(a.getClass). The right hand side of the call uses a factory method in the Manifest singleton object, to create a Manifest for the (class of the) value coming back from the map. The >:> operator checks to see if the right hand side represents a subclass of the left hand side. This is important. For the getMap method, the manifest will represent the Map trait (actually a Java interface in this case.) The call to a.getClass gives you the runtime class of a. Of course this runtime class implements the Map trait, but you can't do equality comparison. Hence the >:> operator. One last thing, notice that the getString method uses the explicit getType[String]. You would think that the compiler could infer this since the left hand is explicitly declared as a String. It doesn't. When I tried it without the explicit type parameter, my manifest would always Manifest[Nothing].

Saturday, October 10, 2009

The End of The Aughties

There are 72 days left in the Aughties, y'know the current decade: 2000 - 2009. I was looking back at the decade, and what are its most important events. Here's my little list:

9/11 -- This is obvious. September 11, 2001 is clearly one of the most pivotal days in the history of the United States. In the previous century, there are probably only a couple of comparable events: the bombing of Pearl Harbor, V-E day, the moon landing, the JFK assassination. For several generations of Americans, 9/11 will be the most historical day of their life.

The Election of Barack Obama -- President Obama's election was historical in so many ways. Obviously it was historic that an African American was elected President. It also marked a transition to a new generation -- Obama is 15 years younger than Bush or Clinton (and let's not even mention McCain.) Obama is not only a Democrat, but is not from the more conservative, Southern Democrats of Clinton and Jimmy Carter.

Social Media -- Here's where maybe my perspective is skewed by living in Silicon Valley. Social media is not a single event, in fact it is a progression of events. To me, it really started with blogging and YouTube, and then exploded with MySpace, Facebook, and Twitter. It is a fundamental change in the Internet. Every user is a creator of content, as well as a consumer. It is the great democratizing effect of the Internet, and it is only getting started. Even now we are starting to see how businesses, celebrities, etc. realize that not only can they use social media as a channel to customers and fans, but that it is a two-way channel.

Hurricane Katrina -- What made Hurricane Katrina so pivotal is that opened the eyes of Americans. It made people realize that many of their fellow Americans live in awful conditions. The divide between socioeconomic classes in America were never so obvious as during Katrina. When Kanye West went on TV and said that George Bush didn't care about black people, he wasn't just being a jackass, he was stating a sentiment shared by a lot of people.

The iPhone -- What did I say earlier about having a Silicon Valley perspective? Anyways... The iPhone has completely changed so many things for so many people. In the 90's, The Internet changed people's lives by bringing them information. Now the iPhone lets them carry it around in their pocket. Other phones were certainly moving in that direction, but the iPhone broke through by combining a large display with highly usable touch based interface. This revolution continued with the release of the App Store. Now don't get me wrong. A lot of other phones are following suit -- but that's exactly why the iPhone was so historical.

That's my short list. I know it's obviously biased from me being American and living in Silicon Valley. What did I miss? What doesn't belong?

Thursday, October 08, 2009

San Francisco Giants 2009

The Giants had a pretty good season. They were in contention for a playoff spot until the final couple of weeks. But they didn't make it. Folks around here are busy talking about what the Giants need to do to take the next step. The answer is obvious all at once: they need better hitting. However the problem is that the Giants don't actually know how to evaluate hitting talent. So they really don't have any chance at getting good hitting without paying a huge price for it. Let's take a look at some numbers to understand this.

One easy way to evaluate the Giants is by looking at the players who have come up through their farm system. Of their homegrown players that had at least 200 plate appearances, on average they saw 3.62 pitches per plate appearance and walked in 7.2% of their plate appearances. To be fair, Fred Lewis has very good plate discipline (and his plate appearances went way down this year.) If you take him out, then the numbers are 3.52 pitches per plate appearance and a 6.8% walk rate. For comparison's sake, if you look at the top seven teams (in terms of runs scored), they average a walk in 9.8% of their plate appearances.

So the Giants farm system sucks. Maybe they can sign good hitters? Nope. If you look at their top hitters that they signed from other teams, they see 3.58 pitches per plate appearance and a 5.9% walk rate. That's right, they see a few more pitches, but they walk even less. The Giants even made a "big" trade at midseason, for Freddy Sanchez. Everyone is concerned that Freddy might have become injury prone suddenly. What they should really be worrying about is that Freddy sucks. He sees 3.81 pitches per plate appearance, which is not too bad. However, he only walks 4.5% of the time. Further, his 0.417 career slugging percentage is just awful, even for a second baseman (which is generally a strong offensive position in modern baseball.) So he doesn't swing at everything (he also has a low strikeout rate) but yet he still does not try to get a good pitch to hit and winds up playing for a single. This is your blockbuster trade material? This is what you get in exchange for the second best pitcher in your farm system?

Yeah, so clearly the Giants front office has no idea what makes a good hitter. People like to say that the Giants home park is a great pitchers park, as it is a tough place to hit home runs. That does not make it a tough place to take a bad pitch or take a walk. In fact, you would think that in such a park, they would an even higher premium on hitters who can get on base any way they can. It's funny, you often hear that the reason that the A's are no longer a good offensive team is because other teams figured out what they were doing. If you look at the Yankees and Red Sox, or for that matter the Rays and the Rockies, you can see evidence of this. However, clearly there is one team that has not figured things out and that is the Giants.