Programming and politics: 02/01/2009

Friday, February 27, 2009

A Great Day ... and A Day of Infamy

Today is a great day for America, and a day of infamy. Do you want the good news or the bad news first?

The good news is that today is the beginning of the end of the Iraq War. This war has been going on for so long, and has mostly faded into the back of people's minds. This is especially true these days as people are more concerned with the economy than anything else. But let's not forget how this war started.

The Iraq War became a certainty on 9/11/2001. On that day, just hours after terrorists from Saudi Arabia killed 2,974 Americans, the US government began preparing to attack Iraq. There was only one problem. Iraq was not involved in 9/11. The US is an ally of Saudi Arabia, where the terrorists has come from, so nothing could be done there. All that could be done was to go after the leader of the al-Qaeda, Osama bin Laden, and he was in Afghanistan. Luckily plans for attacking Afghanistan had been drawn up before 9/11. Those plans did not involve any US troops, just air support and special forces. It only took a few months to topple the Afghani government, and install drug traffickers to run Afghanistan. Given such a plan, it was not surprising that bin Laden was not found in Afghanistan.

With that little detail out of the way, it was time to move on to Iraq. But how to justify the war? We all got to learn a new term: Weapons of Mass Destructions or, in militaryspeak: WMDs. The US government produced all kinds of propaganda about Iraq and WMDs. Everything from yellowcake uranium to aluminum tubes were used to convince Americans that Iraq had The Bomb and was going to give it to terrorists who would use it on America.

And Americans believed it. Why? Are we all that stupid? Was the propaganda that good? Well maybe, but the real reason is that we wanted to believe it. We had malice in our hearts and wanted vengeance. The "victory" in Afghanistan had not satisfied this bloodlust. Maybe if we had caught bin Laden and let the NYPD beat him to death with nightsticks on primetime television (tape delayed for the west coast of course) then we would have been satiated and a little more likely to call BS when the "facts" about Iraq were presented. Who knows.

So the war began... It was definitely a more satisfying war for television views. We got Shock and Awe. We got to see huge numbers of troops marching through the desert. We got to see the statue of Saddam Hussein toppled and the American flag raised. We got to see our leader declare victory on an aircraft carrier. Good stuff. Great television.

Meanwhile there were massive casualties, but they were not the kind we cared about. They were not American casualties. They were not even Iraqi military casualties. No, the Iraqi military was virtually non-existent after years of economic sanctions against Iraq. The casualties were Iraqi civilians. Most agree that were tens of thousands, with some estimating hundreds of thousands of Iraqi citizens killed as a result of the war. It does not matter what the exact number is. It could have been millions of Iraqis killed, and it was still not newsworthy in the US.

However, not too long after victory was declared, the US started suffering casualties. There was a civil war going on, created by the vacuum of power left behind when the US overthrew the Iraqi government. US troops were prime targets, as Iraqi militants knew that the best way to get the US to leave their country was to kill US troops.Soldiers being killed by road side bombs is newsworthy in the US as it turns out.

The situation only got worse in Iraq, until finally in February of 2007, the US increased troop levels to support policing of Iraqi streets. Many folks, including John McCain, had been saying this was needed and should have been done before victory had been declared. After more than six months of this, the violence had only increased. In August of 2007, religious bloodshed caused the leader of one of the chief combatants, Mugtada al-Sadr, to call for a cease fire. In September of 2007, the US government claimed that violence was down by 50% and took full credit for this.

In 2008, as part of his campaign for President,Barack Obama promised to withdraw from Iraq. This was viewed as a weakness in his campaign. During debates analysts would claim that Obama was a little weak on foreign policy, but strong on the economy. The economy won out, and Obama is President. Today he announced a plan to withdraw the bulk of US forces by 2010. This is pragmatically about as fast as is possible. It's not easy to move 100,000+ troops and all of their supporting infrastructure and equipment.

This post ran long, so you'll have to wait for the bad news...

Thursday, February 26, 2009

Lift 1.0

The Lift web framework hit its 1.0 release today. Major congratulations goes to the Lift team. I have been lucky enough to get to know Lift founder David Pollak and Lift committer Jorge Ortiz. They are both very bright engineers and the high quality of Lift reflects this. I was very happy to give a quote about Lift for the 1.0 launch. In my quote I talked about how innovative Lift is. I would like to expand on this.

First, let me say that I realize that Lift has borrowed some ideas from other frameworks. Heck, David gives credit where its due and on the front page of Lift lists Seaside, Ruby on Rails, Django, and Wicket as sources of inspiration. Given that here are the things that I find innovative about Lift.

View-first design. Everybody loves MVC, but it is no panacea. I have seen a lot of apps that used Struts or Spring MVC that wound up putting a lot of code in the actions or controllers. If it wasn't there, then it often leaked out into the view code. Even if the view code did not contain enormous scriptlets, there would be tons of control flow tags (if, switch, for-loops, etc.) The view-first approach is to go to the view first, and pull in logic bits (snippets) as needed. If a snippet needs some data, it gets it. If it creates a form, then it handles the submission... sometimes with Ajax. Snippets use Scala's XML support for creating view code (XHTML.) Some people think this mixes view code into "back-end" code, but that is bit of a robotic response. What is really great is that it is all statically typed. You can't reference a variable that doesn't exist or a non-existent method/property (like you would in JSP, etc.) You would get a compile error. Plus your XHTML has to be well-formed -- or it won't compile. As tooling improves around Scala, you will be able to do static analysis and refactoring to snippet code.

Record/Mapper (ORM). Lift's ORM may not be quite as sophisticated as JPA, but its syntax is beautiful. Now some of this is thanks to Scala, but Lift really makes good use of Scala's type system. By using parameterized traits, you get class that look very declarative, i.e. like they are just defining metadata about the mapping between class and database table, but there is actually a lot of concrete code being reused. It is as elegant as the metaprogramming that you see in Rails or Grails (and thus more elegant than JPA) but its more concrete nature not only make it less mysterious, but makes it easier to debug. I would guess that it also helps with performance. Of course it would be hard to measure this, since Scala outperforms Ruby and Groovy.

Comet. One of the things that Rails has had going for it is that it seemed design for Ajax from the get-go. This is definitely true of Lift, too. I would say that Lift is even more designed for Ajax, but that is a little subjective. What is more objective is that Lift is designed for Comet. Once again it leverages Scala brilliantly, by using Scala's Actors.

That's just a few things, but hopefully you see the trend here. Not only does Lift improve on existing ideas, but in some cases it really breaks new ground.

Wednesday, February 25, 2009

Spring is Here

Well not quite, but spring training is here. My local teams, the Giants and A's, both play their first spring training game today. Tim Lincecum is even pitching today. Very exciting!

Of course it's been an ugly time lately for Major League Baseball. Everyone is upset about Alex Rodriguez and his use of steroids in the past. I have mixed feelings on this. First, I don't care that A-Rod used steroids. I don't care who used them. I don't find any difference between a guy using steroids or using "legal" supplements or sophisticated weight training or getting a cadaver's ligament sewn to their knee when they tear their ACL. These are all examples of technology allowing athletes to be better athletes than has ever been possible before. They are bigger, stronger, faster, and they are able to play at a high level for a longer amount of time. Does this mean that as technology progresses, that the sports may change so much that it turns off fans? Yeah, maybe. Perhaps it has already started.

All of that being said, I hate dishonesty even when it is required. I don't blame A-Rod for lying about using steroids until evidence was surfaced that contradicted him. I just can't stand to hear the PR crap that he recites to Peter Gammons or at press conferences. How bad for A-Rod would it be if instead of saying "I was young and stupid" he said "I wanted to hit as many home runs as possible because the more I hit, the more money I made, so I took steroids." I know he's got to say that he's not using now and that he has to minimize the amount of time that he did use them, so I can live with these lies. But wouldn't we all be better off if people just said "I took it so that I could make more money, be more famous, see myself on SportsCenter more often."

Thursday, February 19, 2009

What’s the Scala way for this type of read loop?

Dave Briccettit asked:

What’s the Scala way for this type of read loop?

My answer:


import java.io._
import scala.io._
import scala.collection.mutable._
object PropsParser{
  val state = HashMap.empty[String,String]

  // Run both ways, should print the same thing
  def main(args:Array[String]){
    load
    println(state)
    state.clear
    load2
    println(state)
  }

  // A more Scala-ish way? Could be memory hog
  def load2{
    val src = Source.fromFile(getFile)
    src.getLines.map(_.split("=")).foreach((kv) => state += kv(0) -> kv(1).trim)
  }

  // Dave's original way
  def load{
    val in = new BufferedReader(new FileReader(getFile))
    var line = in.readLine
    while (line != null){
      val kv = line.split("=")
      state += kv(0) -> kv(1)
      line = in.readLine
    }
    in.close
  }

  def getFile= new File("Some/path/to/your/props/file")

}

Tuesday, February 17, 2009

IntelliJ Fail

Upgraded IntelliJ to 8.1 this morning. Launched it and got this:

That lovely message box is 2891 pixels across. So wide, that you cannot even see the control buttons in the bottom right. It also has a bunch of random HTML at the top of the box, as if it was meant to be displayed in a web page instead of this lovely dialog box.

Imagine if Microsoft or Adobe had software like this! Imagine the amount of ridicule. When you pay for software, you expect something a little more ... polished? Professional? Debugged? I don't know, but this is not it. Anyways, I've heard good things about 8.1 and the updated Scala plugin for it, so I hope this is just an aberration.

Tuesday, February 10, 2009

Named Parameters and Builders in Scala

Tonight was the monthly BASE meeting. Jorge did a great talk on Scala actors. Before the talk, Dick was talking about looking for a good builder implementation in Scala. This seemed to be an area where Scala did not offer much over Java. Even using some of Scala's more sophisticated syntactic sugar, the resulting builder is not satisfactory. I asked Dick that if Scala had named parameter, would that be good enough?

So I did some playing around with simulating named parameters in Scala. Let's say we have a class like this


class Beast (val x:Double, val y:Double, val z:Double){
// other stuff in here
}

Now suppose that x and y are required, but z can have a default value of 0. My attempt at simulating named parameters involved creating some classes corresponding to the variables.


class X(val x:Double)
class Y(val y:Double)
class Z(val z:Double)
object X{
def ->(d:Double)= new X(d)
}
object Y{
def ->(d:Double)= new Y(d)
}
object Z{
def ->(d:Double) = new Z(d)
}

Do you see where this is going? Next we need a companion object for Beast:


object Beast{
def apply(xyz:Tuple3[X,Y,Z]) = new Beast(xyz._1.x, xyz._2.y, xyz._3.z)
}

Now we can do something like this:


val c = Beast(X->3, Y->4, Z->5)

So X->3 calls the -> method on the X object. This returns a new instance of the X class with value 3. The same thing happens for Y->4 and Z->5. Putting all thee inside the parentheses gives us a Tuple3. This is passed in to the apply method on the Beast object which in turn creates a new instance of Beast with the given values. So far so good?

Now we just need a way to make z optional and give it a default value if it is not supplied. To do this, we need some Evil.


object Evil{
 implicit def missingZ(xy:Tuple2[X,Y]):Tuple3[X,Y,Z]=(xy._1,xy._2, new Z(0))
}

Now it is possible to get the optional value behavior:


object BeastMaster{
  import Evil._
  def main(args:Array[String]){
      val b = Beast(X->1, Y->2)
      println(b)
      val c = Beast(X->3, Y->4, Z->5)
      println(c)
  }
}

The implicit def missingZ is used to "invisibly" convert a Tuple2[X,Y] into a Tuple3[X,Y,Z].

Unfortunately this is where the coolness ends. You can't switch around the order of the variables, i.e. Beast(Y->2, X->1) or even Beast(Z->5, X->3, Y->4). You can't just add more implicit defs either. Like if you try:


object Evil{
   implicit def missingZ(xy:Tuple2[X,Y]):Tuple3[X,Y,Z]=(xy._1,xy._2, new Z(0))
   implicit def missingZ2(yx:Tuple2[Y,X]):Tuple3[X,Y,Z] = (yx._2, yx._1, new Z(0))
}

This will cause Beast(X->1,Y->2) to fail to compile. You will get the following error:

error: wrong number of arguments for method apply: ((builder.X, builder.Y, builder.Z))builder.Beast in object Beast
val b = Beast(X->3, Y->5)

This is not the most obvious error. The problem (I think) is that the compiler can't determine which implicit def to use. The culprit is type erasure. There is no way to tell the difference between a Tuple2[X,Y] and Tuple2[Y,X] at runtime. At compile there is, so you would think that it would be possible to figure out which implicit to use... Or perhaps it is possible to merge the two implicit together by using an implicit manifest?

Monday, February 09, 2009

Give me the Cache

Web scale development is cache development. Oversimplification? Yes, but it is still true. Figuring out what to cache, how to cache it, and (most importantly) how to manage that cache is one the more difficult things about web application development on a large scale. Most other things have been solved before, but chances are that your caching needs are quite unique to your application. You are probably not going to be able to use Google to answer this kind of design question. Unless of course your application is extremely similar to the one I am about to describe.

I had an application that performs a lot of reads, in the neighborhood of 1000 per second. The application's data needed to be modified very infrequently, less than 100 times per day. Ah, perfect caching! Indeed. In fact the cache could even be very simply: a list of value objects. Not only that, but it's a very small list. It gets even better, the application has a pretty high tolerance for cache staleness. So it's ok if it takes a few minutes for an update to show up. A few different caching strategies emerged out of this too-good-to-be-true scenario. So it was time to write some code and do some testing on these strategies. First, here is an interface for the cache.


public interface EntityCache {
List<Entity> getData();
void setData(Collection<? extends Entity> data);
}

Again, the Entity class is just a value object (Java bean) with a bunch of fields. So first, let's look at the most naive approach to this cache, a class I called BuggyCache.


public class BuggyCache implements EntityCache{
private List<Entity> data;

public List<Entity> getData() {
  return data;
}

public void setData(Collection<? extends Entity> data) {
  this.data = new ArrayList<Entity>(data);
}
}

So why did I call this buggy? It's not thread safe. Just imagine what happens if Thread A calls getData(), and then is in the middle of iterating over the data, and Thread B calls setData. In this application, I could guarantee that the readers and writers would be from different threads, so it was just a matter of time before the above scenario would happen. Hello race condition. Ok, so here was a much nicer thread safe approach.


public class SafeCache implements EntityCache{
private List<Entity> data = new ArrayList<Entity>();
private ReadWriteLock lock = new ReentrantReadWriteLock();
public List<Entity> getData() {
   lock.readLock().lock();
   List<Entity> copy = new ArrayList<Entity>(data);
   lock.readLock().unlock();
   return copy;
}

public void setData(Collection<? extends Entity> data) {
   lock.writeLock().lock();
   this.data = new ArrayList<Entity>(data);
   lock.writeLock().unlock();
}
}

This makes use of Java 5's ReadWriteLock. In fact, it's a perfect use case for it. Multiple readers can get the read lock, with no contention. However, once a writer gets the write lock, everybody is blocked until the writer is done. Next I wrote a little class to compare the performance of the Safe and Buggy caches.


public class Racer {
static boolean GO = true;
static int NUM_THREADS = 1000;
static int DATA_SET_SIZE = 20;
static long THREE_MINUTES = 3*60*1000L;
static long THIRTY_SECONDS = 30*1000L;
static long TEN_SECONDS = 10*1000L;
static long HALF_SECOND = 500L;

public static void main(String[] args) throws InterruptedException {
    final AtomicInteger updateCount = new AtomicInteger(0);
    final AtomicInteger readCount = new AtomicInteger(0);
    final EntityCache cache = new SafeCache();
    long startTime = System.currentTimeMillis();
    long stopTime = startTime + THREE_MINUTES;
    Thread updater = new Thread(new Runnable(){
        public void run() {
            while (GO){
                int batchNum = updateCount.getAndIncrement();
                List<Entity> data = new ArrayList<Entity>(DATA_SET_SIZE);
                for (int i=0;i<DATA_SET_SIZE;i++){
                    Entity e = Entity.random();
                    e.setId(batchNum);
                    data.add(e);
                }
                cache.setData(data);
                try {
                    Thread.sleep(THIRTY_SECONDS);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    });
    updater.start();
    Thread.sleep(TEN_SECONDS);


    List<Thread> readers = new ArrayList<Thread>(NUM_THREADS);
    for (int i=0;i< NUM_THREADS;i++){
        Thread reader = new Thread(new Runnable(){
            public void run() {
                while (GO){
                    int readNum = readCount.getAndIncrement();
                    List<Entity> data = cache.getData();
                    assert(data.size() == DATA_SET_SIZE);
                    for (Entity e : data){
                        System.out.println(Thread.currentThread().getName() +
                                "Read #" + readNum + " data=" + e.toString());
                    }
                    try {
                        Thread.sleep(HALF_SECOND);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        });
        readers.add(reader);
        reader.start();
    }

    while (System.currentTimeMillis() < stopTime){
        Thread.sleep(TEN_SECONDS);
    }
    GO = false;
    for (Thread t : readers){
        t.join();
    }
    long duration = System.currentTimeMillis() - startTime;
    updater.join();
    int updates = updateCount.get();
    int reads = readCount.get();
    System.out.println("Duration=" + duration);
    System.out.println("Number of updates=" + updates);
    System.out.println("Number of reads=" + reads);
    System.exit(0);
}
}

This class creates a writer thread. This thread updates the cache every 30 seconds. It then creates 1000 reader threads. These read the cache, iterate and print its data and pause for half a second. This goes on for some period of time (3 minutes above), and the number of reads and writes is recorded.

Testing the BuggyCache vs. the SafeCache, I saw a 3-7% drop in throughput from using the SafeCache. Actually this was somewhat proportional to the size of the data (the DATA_SET_SIZE variable.) If you made it bigger, you saw a bigger hit as the reading/writing took longer and there was more contention.

So this seemed pretty acceptable to me. In this situation, even a 7% performance hit for the sake of correctness was worth it. However, another approach to this problem came to mind. I like to call it a watermark pattern, but I called the cache LeakyCache. Take a look to see why.


public class LeakyCache implements EntityCache{
  private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
  private final AtomicInteger index = new AtomicInteger(0);

  public List<Entity> getData() {
      List<Entity> current =  data.subList(index.get(), data.size());
      return current;
  }

  public void setData(Collection<? extends Entity> newData) {
      int oldSize = this.data.size();
      this.data.addAll(newData);
      if (oldSize > 0){
          index.set(oldSize + newData.size());
      }
  }
}

The idea here is to keep one ever growing list for the cache and index (or watermark) to know where the current cache starts. Every time you "replace" the cache, you simply add to the end of the list and adjust the watermark. When you read from the cache, you simply copy from the watermark on. I used an AtomicInteger for the index. I probably did not need to, and a primitive int would have been good enough. I used a CopyOnWriteArray for the cache's list. You definitely need this. Without it you will wind up with ConcurrentModificationExceptions when you start mutating the cache with one thread, while another thread is iterating over it.

So you probably see why this is called LeakyCache. That internal list will grow forever. Well at least until it eats all of your heap. So that's bad. It also seems a bit more complicated than the other caches. However, it is thread safe and its performance is fantastic. How good? Even better than the BuggyCache, actually 3x as good as the BuggyCache. That deserves some qualification. Its througput was consistently more than 3x the throughput of the other caches, but I didn't run any long tests on it. It would eventually suffer from more frequent garbage collection as it leaks memory. However, if your updating is not too frequent, the entities are relatively small, and you've got lots of memory, then maybe you don't care and can just recycle your app once a month or so?

Maybe you aren't satisfied with that last statement. You're going to force me to fix that memory leak, I knew it. Here is a modified version that does just that.


public class LeakyCache implements EntityCache{
   private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
   private final AtomicInteger index = new AtomicInteger(0);
   private final ReadWriteLock lock = new ReentrantReadWriteLock();

   public LeakyCache(){
       Thread cleaner = new Thread(new Runnable(){
           public void run() {
               while(true){
                   try {
                       Thread.sleep(60*60*1000L);
                   } catch (InterruptedException e) {
                       e.printStackTrace();
                   }
                   lock.writeLock().lock();
                   if (data.size() > 500000){
                       List<Entity> current = new ArrayList<Entity>(getData());
                       index.set(0);
                       data.clear();
                       data.addAll(current);
                   }
                   lock.writeLock().unlock();
               }
           }
       });
       cleaner.start();
   }
   public List<Entity> getData() {
       lock.readLock().lock();
       List<Entity> current =  data.subList(index.get(), data.size());
       lock.readLock().unlock();
       return current;
   }

   public void setData(Collection<? extends Entity> newData) {
       lock.readLock().lock();
       int oldSize = this.data.size();
       this.data.addAll(newData);
       if (oldSize > 0){
           index.set(oldSize + newData.size());
       }
       lock.readLock().unlock();
   }
}

So what does this do? In the constructor we spawn Yet Another Thread. This one periodically (once an hour in the example) checks to see how big the cache is. If it is over some limit, it gets the current data, clears the cache, adds the current data back to it, and reset the watermark to 0. It also Stop The World to do this by once again using a ReentrantReadWriteLock. Notice how I have abused the lock by using the read lock for both getting and setting the cache. Why use it for setting? The cleaner thread gets exclusive access to the write lock. It uses it when it is cleaning up. By having the setData method use the read lock, it will be blocked if the cleaner thread is in the middle of a cleanup.

Adding this extra bit of complexity fixes the memory leak, while maintaining thread safety. What about performance? Well the performance is highly configurable depending on how often the cleaner thread runs (well how long it sleeps really) and how big you are willing to let the cache grow before cleaning it up. I put it on same very aggressive settings, and it caused about a 15% hit to the leaky version. The performance is still much better than any of the other versions of the cache.

Next up ... write the cache in Scala using Actors.

Sunday, February 08, 2009

Added Twitter Social Graph APIs

Twitter recently created social graph APIs. You can get the friends or followers of a given user. This information was available in other ways previously, but the new social graph APIs are very convenient they return an array of numeric IDs. So if you want to do things like visualizing the social graph of people or use this to suggest friends, etc. now you've got an easier way to do it. Of course a great way to visualize such data is in Flash. So I added these APIs to TwitterScript. When the TwitterEvent returns, its data property will be the array of numeric IDs.

Arrest Steve Kerr

Do you remember the Phoenix Suns? Do you remember the team that played like it was 1979? Do you remember Steve Nash dishing out 20 assists a game? Do you remember Amare Stoudemire scoring 40 points a game all on dunks? Do you remember barrages of 3pters coming from Quentin Richardson, Joe Johnson, Raja Bell, Leo Barbosa, and yes Shawn Marion? Man, those were the days.

Seems so long ago, but it was just a year (ok Richardson and Johnson let a couple of years ago.) Even if you had no interest in the Suns, if like basketball, then you had to love those Suns. But you know how it goes, it's hard to keep great teams together, and the Suns are no exception. So what happened to them? Salary cap? Player egos? Injuries? Nope. Steve Kerr.

Steve Kerr has singlehandedly destroyed this time. First, he traded away Shawn Marion for Shaq. He's been lucky with this move. Shaq has played as good as you could possibly hope. Most importantly, he has stayed healthy. Nonetheless, this was a terrible mistake. Marion was a perfect fit. He played great defense, ran the floor, and could shoot from the perimeter. You are forced into a slower tempo game with Shaq.

As bad as that move was, Kerr was far from done. Next he fired Mike D'Antoni. This is the man who brought in the fast paced style of the Suns. This is the man who turned the franchise around and won a coach of the year award along the way. Now he's doing the same thing, but for the Knicks. The Knicks have admittedly been jettisoning players so they can make a move for LeBron James next year. Yet, they have already won almost as many games this year as they did last year. Back to the Suns. Kerr got rid of the man who turned the franchise around, and instead put in a yes-man in the form of Terry Porter.

But it just gets better... Next Kerr makes the classic mistake of trading for an overpriced "star" in the form of Jason Richardson. He got rid of two key players: Raja Bell and Boris Diaw. Bell was a great shooter, and Diaw was a big man who could pass. Richardson is a fine player, but he needs to be the focus of an offense. He doesn't have the outside shooting to complement the other players on Phoenix.

So what does all of this lead to? The Suns are on pace to win 46 games. That would be 8 games less than any full season under D'Antoni. They are also on pace to miss the playoffs, something that never happened under D'Antoni. Way to go Kerr.

Guess what, he's not done. Now he's looking to gut the team to save money. After all, this season is clearly a loss. In particular he wants to trade Amare Stoudemire. Yeah that makes sense. He's hurt Stoudemire by bringing in Shaq and changing the way the team played, so he responds by trading him away. This is going to be one of those historically bad moves, like drafting Sam Bowie over Michael Jordan or the Minnesota Vikings trading for Herschel Walker.

Friday, February 06, 2009

Stupid Cavs Fans

Just read about how pissed Cavs fans are that Mo Williams is not replacing Jameer Nelson in the All-Star Game. Seems most Cavs thought that Mo deserved to go over Nelson in the firs place, and are just plain insulted that Ray Allen was picked to replace Nelson. Do the Cavs fans have a legit beef? No.

First off, all three players are on teams with very good records. Boston, Cleveland, and Orlando are all winning their divisions by huge margins and all three have taken turns as the team with the best record in the NBA. With that out of the way, we can concentrate on stats. If you only consider scoring, then you'd rank them Allen (18.2), Williams (17.1), and Nelson (16.7). Nelson leads the group in FG% 50.3%, then Allen with 49.5%, and finally Williams at 46.4%. The FG% of Williams is pretty good for a guard, but Nelson and Allen's numbers are huge. Usually you only see those kind of numbers from guards who do not show 3pters, a la Tony Parker. However Nelson and Allen are both averaging more than two 3pters made per game, and both shooting over 40% on 3pters. Based on their ability to shoot and score, it's very hard to argue Mo Williams over either Allen or Nelson.

Basketball is more than just shooting, though. Williams averages 4.2 assists per game. That is better than Allens' 2.8 assists per game, but less than Nelson's 5.4 assists per game. However Williams is the worst of the group in turnovers, at 2.3 per game. Nelson only has 2.0 TOs/game and Allen 1.8 TOs/game. So based on ball handlings, maybe you could argue Williams over Allen.

What about defense? The only stat we can go on here is steals. Again Nelson is the best with 1.2 steals/game, then Allen at 1.0 and Williams at 0.8. So it is crazy to argue Williams over Nelson. Nelson is a better scorer, ball handler, and defender. To argue Williams over Allen, you can only consider ballhandling. Allen is better in scoring and defense (rebounding too, but these are guards.)

The final argument I've heard for William has nothing to do with Williams or the people chosen over him. It goes "how can Cleveland have only one All-Star when they have such a good record?" Perhaps the better question is "how can Cleveland have such a good record when they only have one All-Star?"

Thursday, February 05, 2009

Gas Math

I am lucky to live close to where I work. It's not totally luck. I used to live very far from where I worked, and all changes (changing where I live or where I work) since then have reduced this distance. I am also luck to drive a fuel efficient vehicle, a 2006 Volkswagen Passat. There are two ways for me to get from home to work. The first is to take surface streets (Meridian Avenue to Hamilton, in case you want to stalk.) This is a five mile drive, with lots of red lights along the way. My car gets an average of 21 mpg for this trip. The other way I can go is to take freeways (CA-85 and CA-17 for the stalkers.) This is a seven mile trip, but my car gets around 29 mpg. It is also about 5 minutes less time. So which way do I go?

The surface route consumes 0.23 gallons of gasoline. The freeway route consumes 0.24 gallons of gasoline. That's a difference of 0.01 gallons. If I drive this twice a day, 250 times a year, that is 5 gallons of gasoline. So anywhere between $10 to $20 difference, depending on the ever volatile price of gas. It is also an extra 200 miles on my car. Given the way the IRS expenses miles, that is about $242. So let's say it costs me $260/year to take the freeway. But at 5 minutes each way, that is 41+ hours of my time. So is my time worth $6.24/hour? Yeah, I'll take the freeway.

Tuesday, February 03, 2009

Function Templates in Objective-C

If you're lucky enough to experience my TwitStream lately, then you might have noticed that I've been doing some iPhone development. That means I'm working in Objective-C. It's reflection capabilities are interesting, to say the least. Low overhead reflection does allow for some code templates. Take a look at this.


double MeanTime(SEL selector){
 NSArray *dataList = GetDataList();
 if ([dataList count] == 0){
  return 0.0;
 }
 double cnt = (double) [dataList count];
 double sum = 0.0;
 for (StatData *data in dataList){
  if ([data respondsToSelector:selector]){
   sum += [[data performSelector: selector] doubleValue];
  }
 }
 return sum/cnt; 
}
double MeanDownloadTime(void){
 return MeanTime(@selector(downloadTime));
}
double MeanProcessingTime(void){
 return MeanTime(@selector(processingTime));
}
double MeanTotalTime(void){
 return MeanTime(@selector(totalTime));
}

Yeah, I know, what's up with the C-style functions? This was inherited code where I was only changing the implementation of those functions, and I didn't feel like wrapping them in a nice pretty class just for the blog. The StatData class has several synthesized properties, including downloadTime, processingTime, and totalTime. I needed three functions that iterated over a list of StatData and computed the average of these properties. Objective-C's selectors provided a nice way to do this. I have to admit that I tried to over-think this at first, and tried using @selector(getDownloadTime:) at first, instead of @selector(downloadTime). I thought that this would be the method name that the compiler would synthesize from the property. This didn't even work, but the more obvious, straightforward usage did.

Monday, February 02, 2009

Scala Puzzler and IntelliJ

Today I received a serial number for the highly regarded IntelliJ. The license was courtesy of the San Fran JUG, as a thank you for speaking there last month. I used to use IntelliJ many years ago, when I worked in consulting. Back then its refactoring was much more powerful than Eclipse and NetBeans was a joke. I haven't used it too much since then, as I became quite the Eclipse convert. Every once in awhile I will take a look at it. Today I thought I should give it more of a try, since I had this license for it. So during my lunch, I set it up, got its Scala plugin, and solved a Facebook puzzle. Actually I shouldn't say that I solved the puzzle, my solution is a naive, inefficient solution. Anyways, here's the code


import java.io._
import scala.collection.mutable._
import scala.io._

object PeakTraffic {
    def main(args:Array[String]){
        val users = HashSet.empty[User]
        val stream = if (args.length == 1) new FileInputStream(args(0))
            else Thread.currentThread.getContextClassLoader.getResourceAsStream("traffic.txt")
        val src = Source.fromInputStream(stream)
        src.getLines.foreach((str) => {
            val emails = str.split(" ").filter(_.contains("@")).map(_.trim)
            val sendingUser = User(emails(0))
            val receivingUser = User(emails(1))
            users(sendingUser) match {
                case true => sendingUser = users.find(_ == sendingUser).get
                case false => users += sendingUser
            }
            users(receivingUser) match {
                case true => receivingUser = users.find(_ == receivingUser).get
                case false => users += receivingUser
            }
            sendingUser <==> receivingUser
        })
        makeClusters(users).foreach(println(_))
    }

    def makeClusters(users:Collection[User])={
        val clusters = new ArrayBuffer[List[User]]
        users.foreach(_.friends.foreach((friend) =>{
            clusters += friend :: friend.friends.filter(friend.isFriend(_))
        }))
        clusters.filter(_.size >= 3).map(_.sort(_ < _)).toList.removeDuplicates.map(_.mkString(", ")).sort(_ < _)
    }
}
case class User(email:String) extends Ordered[User]{
    private val friendSet = HashSet.empty[User]
    private val sentSet   = HashSet.empty[User]
    private val receivedSet = HashSet.empty[User]

    def compare(that: User) = email compareTo that.email
    override def toString = email
    def friends = friendSet.toList
    def isFriend(friend:User) = friendSet(friend)
    def <==> (user:User) = {
      this >> user
      user << this
    }
    private
    def >>(user:User) = {
        sentSet(user) match {
            case false => {
                receivedSet(user) match {
                    case true => {
                        friendSet += user
                        receivedSet -= user
                    }
                    case false => sentSet += user
                }
            }
        }
    }
    def <<(user:User) = {
        receivedSet(user) match {
            case false => {
                sentSet(user) match {
                    case true => {
                        friendSet += user
                        sentSet -= user
                    }
                    case false => receivedSet += user
                }
            }
        }
    }
}

A couple of things about IntelliJ... Code completion seemed much slower than on NetBeans or Eclipse. Similarly most errors weren't flagged until the code was compiled (maybe it's this way with Java, too?) Debugging was good. Some refactoring is supported. I think IntelliJ is unique in this regard. However, I couldn't rename a method to an operator symbol (>> in this case.) Also, method extraction did not seem to work, which is probably the refactoring technique I use the most when I code. The indentation was also flaky. Like I could create a class, hit return (indented) create a val, hit return (indented even more), etc. I think the lack of semicolons might have been causing it problems, but that's just a guess. Overall, I think the NetBeans plugin is better, but at least my experience wasn't as bad as David's.