Programming and politics: programming

Showing posts with label programming. Show all posts

Saturday, March 03, 2012

Diversions in C Minor

Last year we saw the passing of Dennis Ritchie. I was one of the lucky students who used the K&R book as a text book in college. Actually it's pretty amazing that the text books for CS first years were K&R and The Mythical Man-Month. There was pretty much no excuse for me to F up after being taught from those two books. The untimely passing of Ritchie made me recollect that I never really finished K&R. My professor used many of the chapters as reference material, and of course at some point I figured that I already knew everything so well that I didn't need no stinkin' book. So I decided then that I would re-read the book including all of its problems.

Of course I didn't start re-reading K&R right then as I should have. But I did get around to it recently. The very first problem in chapter two asked the reader to calculate the minimum signed chars, shorts, ints, and longs, and the maximum of those plus their unsigned cousins. These are all very straightforward things to calculate for anyone with an understanding of modular arithmetic. The book then mentions a more challenging problem: calculating minimum and maximum floats. No problemo I thought: Obviously that doesn't work, or this would be a pretty boring blog post. Here's where the awesome comes in. On my MacBook Air, this prints 1.175494e-38 for FLT_MIN but the second line prints 1.401298e-45 for the "calculated" minimum float. That's right it calculates a value that is smaller than FLT_MIN.

Experienced C developers will surely recognize what is going on here but I was flummoxed. It was only upon looking at some of the other constants defined in float.h that I started to figure out what was going on. In particular I noticed that on my MBA, FLT_MIN * FLT_EPS = 1.401298e-45. FLT_EPS is "float epsilon" and is defined as the smallest x for which 1+x != x. This is sort of the standard error on the system, i.e. any calculation could be off by a factor of FLT_EPS. I decided to calculate this in my program and then to also use it to calculate FLT_MAX: The FLT_MAX calculation takes a long time, since it uses such a small increment. I'm sure a bigger increment could be used to speed it up.

The JavaScript assembler

Earlier this week, the third part in a series of articles I wrote was published on developerWorks. The series is on CoffeeScript, a language that I first learned of about a year ago. Last summer I read CoffeeScript by Trevor Burnham over the course of a week while on a cruise ship. Later that summer I was at the Strange Loop conference, and serendipitously wound up at a dinner table with several of the guys who develop Clojure. A lot of the conversation that evening was on ClojureScript.

As a long time web developer I am not surprised by the growing interest in programming languages that compile into JavaScript. The browser has never been more important, and thus JavaScript has never been more important. Most programmers that I know find that JavaScript leaves a lot to be desired. This is not just a matter of taste, JS is fucking weird. In fact for years, a lot of the more "serious" programmers I knew would turn up their noses at JS. They were too good for that trash, so they kept to the back-end. It's become tougher and tougher to avoid JS, so ... use a different programming language that compiles to JS. Brilliant!

This idea is hardly new. Personally, I can't help but think of Google Web Toolkit. For years GWT has let your write in Java and compile to JS. So why bother with CoffeeScript, ClojureScript, and probably some other awesomes that I'm leaving out? One obvious thing is that Java is an existing language and is statically typed. So it has its own baggage (which includes excellent tooling, but still) and of course many developers do not like statically typed languages. In fact, many argue that web development with a statically typed language is an impedance mismatch of the highest order. Personally I think GWT was also hurt by initially defaulting to producing highly obfuscated "production ready" JS. I always thought that was developer unfriendly. I think that may have changed over the years, but sometimes it's hard to overcome an initial impression. I think one of CoffeeScript's great strengths is that it produces very developer friendly JS by default. I've never built any huge apps in either language, but in my limited experience I found that debugging was easier in CoffeeScript. I could usually figure out what I had done wrong just by looking at the error message.

Any discussion like this would be amiss without mentioning Dart. It gives you dynamic and static aspects, in a way that reminds me of ActionScript. Personally I think optional typing works best when use variable:type style declarations, like ActionScript and Scala, but that's a minor nit. Of course when you consider that "native Dart" is supported in a version of the world's most popular browser, the possibilities start to get really interesting. I attended a Dart meetup recently where the Chrome developer tools allowed breakpoints and the like directly in the Dart script that was running. The SourceMap proposal would allow this kind of debugging of other higher level languages like CoffeeScript and ClojureScript.

The future seems bright for languages targeting a JavaScript runtime. I haven't even mentioned node.js and how that makes these languages viable for both client and server applications. To me CoffeeScript, Dart, and ClojureScript are all promising. Each seems to have a large target audience. If you like dynamic languages like Python and Ruby, CoffeeScript fits right in. If you prefer statically typed languages like Java and C++ and all of the nice tools that come with them, then Dart is attractive. If you like pure functional languages like Haskell and Lisp, then ClojureScript is a fun way to get your lambda calculus kicks in the browser. Of course there is a lot of anti-Lisp bias out there, so maybe there's room for another functional language that targets a JS runtime, something like a F#-script or ScalaScript.

Friday, May 28, 2010

Phone Number Fixer for Android

One of the curses of working with computers is that you often become The IT person for friends and family. In fact, I should get a commission from Apple for all of the business I've sent their way by getting family members to switch to Mac computers. Ditto for the Firefox browser. I can't make people switch to Mac, after all that's often an expensive proposition, but it's long been a condition of my help that you never even think about using IE. For some of my less computer savvy relatives, I've learned to make IE disappear. This just made my life a lot easier (less things to fix) especially back in the heady days of weekly IE6/ActiveX exploits. Anyways... These days most folks, including friends and family, mostly use web applications and since I've had them on Firefox for a long time now, there aren't too many problems. However, recently I got an unusual request for help from a high school friend Maria.

She had just switched to a Nexus One -- a phone that she seemed to be extremely happy with. She had imported several contacts from her SIM card that had been in her old phone. The only problem was that for these contacts, their phone number was classified as "Other." That would not be a big deal, but when she would try to send a text to a number, Android's auto-complete would not include these "Other" phone numbers as part of its search domain. So she would have to completely type out the number to send the text to -- which kind of defeats the purpose of having an address book on your phone. She could manually change each of these phone numbers to be of type "Mobile", and this would solve the problem for that number. However as you might guess, she had a lot of numbers and changing each manually would be painful to say the least. And that's where I come in...

This sounded like an easy enough problem to solve. Find all of the numbers that were of type "Other" and change them to "Mobile." That might not work for everyone -- there might be some people who really want to classify some phone numbers as "Other" -- but it certainly worked for Maria (and I would guess most people, too.) So I cooked up a quick little app. First, I wanted to display all of the numbers that this was going to affect:

ContentResolver resolver = getContentResolver();
String[] fields = new String[]{People._ID, People.NAME};
Cursor cursor = resolver.query(People.CONTENT_URI, fields, null, null, People.NAME);
LinkedHashMap<Integer, String> people = new LinkedHashMap<Integer,String>();
int id = cursor.getColumnIndex(People._ID);
int nameCol = cursor.getColumnIndex(People.NAME);
if (cursor.moveToFirst()){
 do{
  people.put(cursor.getInt(id), cursor.getString(nameCol));
 }while (cursor.moveToNext());
}
cursor.close();

This gives you a LinkedHashMap whose keys are the IDs of each contact, and whose values are the names of the contacts. Why did I bother with these two bits of info? Well, we need the contacts to query the phones, and I wanted something friendly to display to Maria so she knew which numbers were about to get "fixed". Anyways, now it was easy to query the phones:

ArrayList<String> data = new ArrayList&kt;String>();
StringBuilder sb = new StringBuilder();
String[] projection = new String[]{Phones.DISPLAY_NAME, Phones.NUMBER, Phones._ID};
Uri personUri = null;
Uri phonesUri = null;
int displayName = 0;
int number = 1;
int phoneIdCol = 2;
int phoneId = -1;
int cnt = 0;
for (int personId : people.keySet()){
 personUri = ContentUris.withAppendedId(People.CONTENT_URI, personId);
 phonesUri = Uri.withAppendedPath(personUri, 
            People.Phones.CONTENT_DIRECTORY);
 cursor = resolver.query(phonesUri, projection, 
            Phones.TYPE + "=" + Phones.TYPE_OTHER, null, null);
 displayName = cursor.getColumnIndex(Phones.DISPLAY_NAME);
 number = cursor.getColumnIndex(Phones.NUMBER);
 phoneIdCol = cursor.getColumnIndex(Phones._ID);
 if (cursor.moveToFirst()){
  do {
   sb.setLength(0);
   sb.append(people.get(personId));
   sb.append(": ");
   sb.append(cursor.getString(displayName));
   sb.append('|');
   sb.append(cursor.getString(number));
   data.add(sb.toString());
   phoneId = cursor.getInt(phoneIdCol);
   cnt += updateContact(phoneId);
  } while (cursor.moveToNext());
 }
 cursor.close();
}
ArrayAdapter<String> adapter = new ArrayAdapter<String>(this, R.layout.phone, data);
setListAdapter(adapter);
Toast.makeText(this, cnt + " phone numbers updated", Toast.LENGTH_LONG).show();

This code just loops over the contacts we got from the first query. For each of those contacts it queries to see if the contact has any phones that are of type "Other" (Phones.TYPE_OTHER). If it does, it creates a string that shows the contact's name, the phone's display name, and the phone number. This string is added to an ArrayList. Once all of the queries complete, an ArrayAdapter is created using the ArrayList of contact name/phone strings and used to populate a ListView.
You might have also noticed that a there is a counter variable being incremented, and an updateContact method. This is actually the method that fixes the phone number. I probably should have just kept track of the phoneIds and then gave the user a button to initiate the fixes, but I was lazy. Here is the code for updateContact.

private int updateContact(int phoneId){
 ContentValues values = new ContentValues();
 values.put(Phones.TYPE, Phones.TYPE_MOBILE);
 ContentResolver resolver = getContentResolver();
 Uri phoneUri = ContentUris.withAppendedId(Phones.CONTENT_URI, phoneId);
 return resolver.update(phoneUri, values, null, null);
}

Too easy. Contacts are an example of a Content Provider in Android. Many people (including my Android in Practice co-author and author of Unlocking Android, Charlie Collins) have criticized Android for exposing too much of the database backing of Content Providers. As you can see from the examples above, you have to deal with cursors (moving, closing), queries, updates, and very thinly abstracted select and where-clauses. Maybe it would have been better to drop down directly to the SQL, or provide a more object oriented API. Now you can create your own Content Provider, and you don't have to use SQL database to back it -- but then implementing the Content Provider contract can be quite awkward.
Anyways, I found that the easiest way to get this little app to Maria was to simply put it on the Market. It's still on there if you happen to have a similar problem (maybe many people switching to Android might experience this?) If you are on your Android phone you can just select this link. If you are on your computer you can scan this QR code.

Friday, March 26, 2010

Programming and fallacies

Today I read a blog post by @joestump where he attempted to refute claims that Digg had been "doing it wrong" when they could not scale their database and went with a NoSQL attack instead. I'm not going to get into that flame war. I will say that folks at Digg should have expected exactly this kind of scrutiny when they decided to start bragging about their technology decisions. If a company has a technology blog, that is all they are trying to do -- brag. Anyways, back to the blog post... I was immediately disappointed to see that it was riddled with logical fallacies. I see logical fallacies all the time, and sometimes I forget that many folks are actually not very familiar with them. In particular, I would say that programmers are not very familiar with the "formal" concept, despite the fact that programmers tend to have very strong logical reasoning. Here is my attempt to do something about that.

First off, I am not going to answer what is a logical fallacy. You can follow that link, or find many other descriptions and enumerations of fallacies. Instead I want to talk about why programmers may not be aware of fallacies, and why it is important for them to learn about them. I think the reason that programmers may not know much about fallacies is that they are generally taught in classes on writing, public speaking, or debating. In other words, even though these are concepts tied to logical reasoning, including induction and deduction, they are not included in classes on mathematical logic. You cannot generally express a fallacy using symbols and computational expressions. This might lead some to believe that the fallacies are subjective, but this is not really true.

So why are fallacies important if they are not easily expressed mathematically, and thus difficult to represent programmatically? I suppose that if all you do is program on your own, then perhaps they are not relevant to you. If instead you work on a team with other programmers (or non-programmers for that matter,) then you will have ideas that are argued/debated. In this case, you need to understand fallacies -- so that you can express your arguments without committing a fallacy and so that you can recognize when people arguing against you are committing fallacies.

If you have an argument, and you find yourself committing a fallacy, then of course you will want to "fix" your argument to remove the error. This will inevitably draw out the assumptions in your argument, and strip out that which is not essential and true. Sometimes this will leave you with nothing, and you will be forced to question your own argument. Perhaps it was false or more likely mostly based on subjective statements, not fact. That can be a bummer, but wouldn't you rather realize that you are wrong than have somebody else point it out? Or worse, have nobody point it out...

You need to recognize when others are committing fallacies. At the very least you need to strike this from your own mental record of their argument. Does their argument make sense without the fallacy, or is it essential? Of course if you start noticing these kind of fallacies, then you may be tempted to point them out and use that to assert that their argument is false and thus your argument must be right. Oops, you just poisoned the well and committed a false dilemma.

Once you start noticing fallacies, you might notice that people commit them a lot. Sometimes this may seem to be true irrespective of the perceived intelligence of the committers. It is easy to make these mistakes when you are unaware of them. They are also much more common when people are "thinking on their feet." In fact, I had a professor who once joked that those who were good at thinking on their feet, were simply good at synthesizing fallacies. I'm not sure if that is a fair generalization, but you get the point. I think people are also much more likely to make these kind of errors when emotion has entered into the argument. I think that was the most likely case in the blog post that inspired this post.

Saturday, November 21, 2009

Passing on the Duby

Last week I had a fun conversation with Charles Nutter. It was about "java.next" and in particular, Charlie's requirements for a true replacement for Java. He stated his requirements slightly before that conversation, so let me recap:

1.) Pluggable compiler
2.) No runtime library
3.) Statically typed, but with inference
4.) Clean syntax
5.) Closures

I love this list. It is a short list, but just imagine if Java 7 was even close to this. Anyways, while I like the list as a whole, I strongly objected to #2 -- No runtime library. Now, I can certainly understand Charlie's objection to a runtime library. It is a lot of baggage to drag around. All of the JVM langs have this issue. JRuby for example, has an 8 MB jar. Groovy has a 4 MB jar. Clojure has two jars totaling 4.5 MB. Scala is also around 4 MB. These become a real issue if you want to use any of these languages for something like Android development. I've written about this issue as well.

However, there is a major problem with this requirement. The building blocks of any programming language includes primitives and built-in types. Those built-in types are part of the runtime library of the language. So if your JVM based language cannot have a runtime library, then you will have to make do with primitives and Java's built-in types. Why is this so bad? Java's types (mostly) make sense in the context of Java, its design principles and syntax. I don't think they would make sense in the java.next hypothetical language described above. The easiest example of this are collection classes. Don't you want a list that can make use of closures (#5 requirement above) for things like map, reduce, filter, etc. ? Similarly, if you static typing, you probably have generics, and wouldn't you like some of your collections to be covariant? You can go even further and start talking immutability and persistent data structures, and all of the benefits these things bring to concurrent programming. This is just collections (though obviously collections are quite fundamental to any language,) but similar arguments apply to things like IO, threading, XML processing, even graphics (I'd like my buttons to work with closures for event handling thank you very much.)

One argument against this is that you can just include the runtime library of your choice. Pick your own list, hash map, thread pool, and file handler implementations. This is what I'd like to call the C++ solution -- only worse. At least in C++ you can generally count on the STL being available. The thing that is so bad about this is that it really limits the higher-order libraries that can be built. Java, Ruby, and Python all have a plethora of higher order libraries that have been built and widely used. These libraries make extensive use of the built-in types of those languages. Imagine building ORMs like Hibernate or ActiveRecord if you did not have built-in types (especially collection classes) that were consistent with the language. You could not do it. If all you could rely on was the Java built-in types, then at best your libraries would have a very Java-ish feel to them, and doesn't that defeat the purpose?

Charlie gave an alternative to this -- leverage the compiler plugins. Now it is certainly true that with a pluggable compiler, you could do this like add a map method to the java.util.List interface, and all of the JDK classes that implement this interface. It can be done. However, if you are going to build higher order libraries on top of this language, then you need to be able to count on these enhancements being present. In other words, the compiler plugin needs to be a required part of the compiler. Fine. Now what is it that we were trying to avoid? Extra runtime baggage, right? Well if you have a compiler that is going to enhance a large number of the classes in the JDK, hello baggage. Maybe it won't be as much baggage as 4 MB runtime library, but it might not be a lot less either. Also, it raises the potential problem of interoperability with existing Java libraries. If I import a Hibernate jar that gives me a java.util.List, that list is not going to be enhanced by my compiler -- because it wasn't compiled with my javac.next, it was compiled with javac.old. What's going to happen here?

Now there is an obvious happy way to deal with the runtime library question: rt.jar. If java.next was Java, and the super revamped collections, IO, threading, etc. classes were part of the JDK, then there is no need for a runtime library since now it has been included with the JDK. However, given the incredibly uncertain state of Java 7, not to mention the long term prospects of Java, does this seem remotely possible to anyone? I don't think so. I think the heir apparent to Java cannot be Java, and I think because of that, it's gotta have a runtime library of its own.

Tuesday, October 27, 2009

Minimalism? Programming? Huh?

Last week, I was at the Strange Loop conference, and it was a great conference. It inspired one blog post that managed to piss some Clojure people off. So here is another. This time I'm going after the closing keynote, Alex Payne. Here are his slides, including detailed notes.

Strange Loop 2009 Keynote: Minimalism in Computing

View more presentations from Alex Payne.

So where to begin... Even though I am going to harsh on this, I will give it very high marks. Why? Because it was thought provoking. If something makes you think long and hard about how and why you do your job, that is a good thing. I worry about people who do not question their assumptions on a regular basis.
Ok, so now on to what you've been waiting for -- the harshness. I hate software engineering analogies. Well, not all, but most. I hate when people try to romanticize about what they do by making a far-fetched analogy to something more glamorous. If you're a programmer, you're not a rock star. You're not an artist. You're not a musician. You're not even a scientist. Sorry. Despite having the word "architect" in my official job title, I would add that you're not an architect either.

You're a programmer. At best, you're an engineer, and that is really stretching it at times. So obviously if I find it ridiculous to compare programming to things like creating art or making music, it seems even more ridiculous to compare the output of programming to art or music.

That being said, I enjoy programming. It is not just a job. I also take pride in my work. I like to show off my code as much as the next guy.

From this perspective, I can definitely appreciate Alex's thoughts on minimalism. Or perhaps more generally, associating some subjective qualities with programming. If an analogy to art or construction or whatever helps one express those subjective qualities, then fine. Just don't forget who you are, and maybe even try to find some pride in just that.

Now I should really end this post now, on kind of a feel good note. But I won't. Instead, I want to touch on some of the *ahem* deeper thoughts that Alex's talk provoked. I was reminded of a paper I read several years ago that tried to explain two obviously overly general (and probably not politically correct) things: why Americans are good at software but not cars, while the Japanese are good at cars, but not software. I tried to find a link to this paper, and was quite sure that I had saved it in Delicious. However I could not find, and I'm convinced that Delicious lost it. Maybe it fell out of cache...

Anyways, the crux of the argument is that Americans are good at getting something done quickly, even though the quality may be poor. So we are good innovators, but terrible craftsmen. This is a good fit for software, but obviously not cars.

If you accept this idea, and it has its merits, then in a world of low quality, rapidly written code, is there any room for subjective qualities? Why should you take any pride in your work at all if its value is directly proportional to how quickly you can produce it, throw it away, and move on to something else? To some degree, doesn't the software world's affection with agile development codify these almost nihilistic ideals? Perhaps the propensity to try and compare programming to art is simply an act of desperation caused by the realization that you'd be better off giving up on good design and instead you should invest in duct-tape.

Anyways, this is a close-to-home topic for me. My day job often involves telling developers the "right" way to do something, and the more unsavory flip-side of this, telling developers when they've done things the "wrong" way. You don't make a lot of friends, but I'm an INTJ so it works for me. Every once in awhile, I work with a programmer who writes beautiful code. Yeah, I said it. Any such programmer knows that they write good code, even if they don't like to talk about it. When you write good code, you constantly see bad code around you, and you can't help but get a big ego. I always compliment the good code, and it always surprises the arrogant pricks. They think I'm blown away and have never seen such good code. In reality, I just feel sorry for them.

Sunday, September 13, 2009

Some Tasty Scala Sugar

I have finally been getting around to working some more on Scala+Android. A lot needs to be done, especially since I am talking about it next month at the Silicon Valley Code Camp. Scala advocates (and I guess that includes me) often say that is very easy for application developers to use Scala, i.e. that "most" developers are never exposed to the more complex aspects of the language. The folks who "suffer" are APIs developers. I've mostly fallen into the first camp (app developers), but now that I'm trying to create a Scala API on top of the Android SDK, I get exposed to more aspects of Scala. Here are a couple of things that I found interesting.

Metaprogramming in Groovy and Scala

This morning I read an excellent pair of articles by Scott Davis on metaprogramming in Groovy. It reminded me of some of the different ways to approach this kind of problem. The second article in particular detailed a new feature in Groovy 1.6, the delegate annotation. Here is an example of using this feature:


public class Monkey {
 def eatBananas(){
   println("gobble gulp")
 }
}

public class RoboMonkey{
 @Delegate final Monkey monkey
 public RoboMonkey(Monkey m){
   this.monkey = m
 }
 public RoboMonkey(){
   this.monkey = new Monkey()
 }
 def crushCars(){
   println("smash")
 }
}

This allows for the following usage:


def monkey = new RoboMonkey()
monkey.eatBananas()
monkey.crushCars()

What is interesting here is that you cannot just create a Monkey and call crushCars on it. RoboMonkey is meant to be a way to add functionality to the Monkey class (in this case the crushCars method), while still being able to treat the RoboMonkey as if it was a Monkey. I point this out because of how this would be done in Scala:


class Monkey{
  def eatBanaas = println("gobble gulp")
}

class RoboMonkey(val monkey:Monkey){
  def this() = this(new Monkey)
  def crushCars = println("smash")
}

object Monkey{
  implicit def makeRobo(monkey:Monkey):RoboMonkey = new RoboMonkey(monkey)
  implicit def makeNormal(robo:RoboMonkey):Monkey = robo.monkey
}

Now arguably this is more complicated, because you have to create the Monkey object in addition to the Monkey class. What Scala requires is that you get those implicit functions (makeRobo, makeNormal) in scope. This is just one way to do that. The added benefit is the following usage:


 val monkey = new Monkey
 monkey.eatBanaas
 monkey.crushCars

I guess the Groovy way to do this is to use the meta class:


Monkey.metaClass.crushCars =  {-> println("smash")}
def monkey = new Monkey()    
monkey.eatBananas()
monkey.crushCars()

In both languages you can accomplish similar things, though with very different ways to implement it. Meta programming in Groovy is very powerful, and there are things you can do in Groovy that you cannot do in Scala -- see methodMissing and propertyMissing. Of course you lose some of the benefits of static typing, but software is all about tradeoffs.

Saturday, August 15, 2009

A Tipping Point for Scala

This past week's BASE meeting was all about IDE support for Scala. You can read my notes, posted to the Scala tools mailing list. I was very surprised by this meeting. Not by the findings, if you will, as I have used all three IDEs at various times in the last few months. What I was surprised by was the feedback from the group, and the logical conclusion of this discussion: Scala is near a tipping point, but IDE support is holding it back.

First off, there was a large turnout for the BASE meeting. I would say it was the second largest BASE meeting, only bested by the June meeting where Martin Odersky spoke. It is funny, because I think our esteemed organizer, Dick Wall, had been intending this topic to be like "well if we have nothing else to talk about it, we'll talk about IDEs." If there had been an alternative topic brought up, I don't think people would have objected. After all, developers and their attitude towards IDEs are contradictory. Most developers I know would tell you that IDE support for a language is very important, but they would also act indifferent about IDEs when it came to them personally. It's like "all of those other developers really need IDEs, but I would be ok without them." We all know our APIs so well, that we don't need code completion, right? And we don't write bugs, so a debugger is of limited use, right? However, I am sure that if the meeting had not been about IDEs, then there would have been less people in attendance.

So why so much interest? Like it or not, but Scala's primary audience right now are Java developers. Yes, I know Scala appeals to some dynamic language folks, and to some functional programming folks, and that its .NET implementation is being updated, but you could sum up all of the Scala developers from those disciplines and it would be dwarfed by the Java contingency. Scala has a lot of appeal on its own merits, but it is always going to be framed against Java. Scala's most (only?) likely path to mass appeal is as "the long term replacement for java."

So when you talk about developers choosing to use Scala, you are really talking about Java developers choosing to use Scala instead of Java. This is not the only use case, but not only is it the most common use case, it is arguably the only use case that matters. Without this use case, Scala will at most be a marginal language, a la Haskell, or OCaml, or Groovy for that matter.

Back to my point... Java developers need great IDEs. This is not because they "need" help from their IDE because of some lack of skill. No, it's because they have had great IDEs for a long time now, and thus it has become a requirement. I remember when I joined Ludi Labs (it was still called Sharefare at the time) several years ago, we had a Java programming quiz. Candidates were given a clean install of Eclipse to use for writing their programs. We could have given them Vi or Emacs and a command line, but that would have been asinine and foolish. IDEs are an integral part of Java development.

I knew all of the above before the BASE meeting, but what I did not know was how many development organizations were at a critical juncture when it comes to Scala. For many folks, Scala, the language, has won the arguments. Whatever perceived extra complexity that it has, has been judged as worth it. Whatever challenges there may be in hiring people to develop in Scala can be mitigated. Legacy code is not even a factor, as integration with existing Java code is trivial. Maybe it's bleak future of Java, or maybe it's the high profile use of Scala at Twitter. Who knows, but Scala is poised to take a big piece of the Java pie.

Thus the missing piece is IDE support. Development orgs can't switch to Scala without IDE support, and the support is not there yet. That's the bad news. The good news is that Scala is ready to explode once the IDE support is there. There are a lot of folks out there ready to adopt Scala simply as a "better Java." They just need an IDE that is on par with Java IDEs. That is the standard.

All of that being said, there is a lot of concern around the IDEs. Many people expressed to me that they are worried that IDE progress is being coupled to the release of Scala 2.8. That seems reasonable at first, but what happens if 2.8 is not released until 2010 sometime? Will Scala lose its momentum and window of opportunity?

Monday, August 03, 2009

The Strange Loop

In October, I am speaking at the inaugural Strange Loop conference in St. Louis. This is not your run of the mill conference. It is organized by Alex Miller, who you might have seen speak the last couple of years at JavaOne. The speaker list is sweet: Alex Payne and Bob Lee are doing the keynotes, with sessions by Charles Nutter, Dean Wampler, Stefan Schmidt, Guillaume Laforge, Jeff Brown, and Alex Buckley. I am doing a talk on iPhone/Android development. It will probably be pretty boring compared to the other sessions. I may try to spice it up with some shameless plugging of Scala+Android balanced by some Fake Steve Jobs quotes about Android.

Thursday, April 30, 2009

Mobile OS Wars: Upgrades

Right now Apple and Google are in the midst of major upgrades to the OS on their devices, the iPhone OS and Android. If you develop for these platforms, you need to make sure your applications have no issues on the new versions. The simplest first step is to upgrade your SDK and development tools, and re-compile. Neither OS updates remove (any?) APIs, they mostly add to them. So it is unlikely your application does not compile under the new SDK. You can also launch your application in the emulator provided with each SDK. Now there is a little more chance of a bug popping up.

Still, a lot of upgrades are only found when you run your app on the new OS on a device. Why is this? A lot of the bugs that come from these upgrades are memory management issues, multi-threaded issues, or networking issues. When you run on an emulator, many of these issues just never happen (memory/networking) or behave differently because of the difference in processor seed (multi-threading.) So you need to upgrade the OS on your test devices (you do have dedicated testing devices, not just your personal phone, right?)

Upgrading your iPhone OS is really quite easy. You download the new OS. You right-click on the "restore" button in iTunes, or you can use the Organizer in XCode. Either way, device is wiped, the new OS is installed, and then your data is reloaded from a backup (iTunes backs everything up automatically.) The only gotcha is that apps that weren't installed via iTunes, i.e. apps you developed and are testing, do not get backed up, so they disappear. This should not be a problem since you have the source code for such apps. However, in order to re-install from source code via XCode, your SDK must also be upgraded to the same version as the OS. Again, you probably upgraded the SDK first anyways, so no big deal.

Upgrading your Android device is not quite as straightforward. There are a lot of steps, so I won't repeat them all here. The upgrade is fundamentally different in that you replace parts of the system in place. Your device is not wiped. This is necessary because there is no automatic backup process on Android phones. On the iPhone, this is handled by iTunes. There is no analogous desktop software companion to Android. This is viewed by many as a pro, but in this case I consider it a con. Anyways, there are a lot of steps, but it is very smooth and very fast. In fact, it is much faster than the iPhone process because there is no delete/restore going on.

All of this points to some of the core differences between these platforms. Others have pointed to the Mac vs. PC scenario of the 80's as being similar to the iPhone vs. Android scenario of today. Apple went with a closed platform where they controlled everything, vs. the "open" combination of Windows plus Intel. Any manufacturer could slap an Intel chip in their machine and install Windows on there, with full freedom on what else went in to the machine. Same thing is going on here. Apple controls the OS and hardware of the iPhone. Android can go on many different types of hardware. Right now this is not making a big difference, since there are not yet a lot of Android phones. However, it seems likely that at some point you will see Android phones that are free with a contract or maybe $100 from Metro PCS or even pay-as-you-go carriers.

Ok so that was a tangent, what does it have to do with this issue? Well, by having a closed system, Apple can provide a lot of advantages. Upgrading is simple. For consumers, you get the upgrade from Apple directly. You might notice the Android link above is from HTC, the maker of the first Android phone. Most consumers will get the upgrade from T-Mobile. Notice that Google is not even in this conversation, even though not only do they make the OS, but it is open source. In addition, many developers I know are dreading the proliferation of Android. Currently development for Android is nice, in fact a lot nicer than for iPhone. The development/emulator scenarios are equivalent, but debugging on devices is much simpler. However, when more variants of Android appear, and those variants are more specific to the devices they run on, then right once / run anywhere fades away. Again, notice that the OS upgrade came from the handset manufacturer already. That is very significant.

Friday, March 20, 2009

iPhone OS/SDK 3.0: Remote Notifications

If you are an iPhone developer, you were probably excited about Apple revealing a beta of OS 3.0 and the corresponding beta of SDK 3.0. There was a lot of excitement and speculation over what new features would be available in 3.0. One of the most anticipated ones was Push Notifications. Everyone thought that Apple would release this since they had promised it in the past, and they were right. So how do you use it? That was the question I asked myself, and here is what I have been able to figure out so far from the documentation.

The first thing you need to do is have your application call the new API registerForRemoteNotifications. This a new API on UIApplication. So when you create your app's delegate (i.e. the class that conforms to the UIApplicationDelegate protocol) you will probably want to invoke registerForRemoteNotifications in your applicationDidFinishLaunching: method (or in the application:didFinishLaunchingWithOptions: method, more on that later.) This will cause the UIApplication to send a request to the Apple Push Service (APS) and it will respond with a token.

When the UIApplication receives that token, it will invoke your app delegate's application:didRegisterForRemoteNotificationsWithDeviceToken:. As the name implies, you get a token as one of the parameters to this method. Your application needs to send this token to your servers. When your server then determines that a notification needs to be sent to a particular device, it will send a request to APS using this token. That is how APS knows where to send the notification to, and also what application the notification is being sent to. It looks like you are pretty limited in what you send to APS, i.e. you do not send (much) application data to APS. You just send it a message saying "hey tell this device that I've got something waiting for it." APS will send this to the user's device.

So the notification arrives, and now there are two possibilities. First, the user is still using your application. In that case, your app delegate is once again invoked, this time it's application:didReceiveRemoteNotification: method is invoked. This lets your app know that you need to phone home to your server and go get the new yummy data.

If your application was no longer active, i.e. that pesky user quit it so they could do something else, then the user will receive a pop-up asking them if they want to launch your application because a notification has been sent to it. If they choose to do so, then when you application launches the app delegate's application:didFinishLaunchingWithOptions: method will be invoked.

In both cases, the delegate will receive an NSDictionary representing the notification as the last parameter passed in. Again, it appears that what can be in this dictionary is pretty limited as its max size is 256 bytes. You can put some custom stuff, like maybe some enum value to represent what kind of notification it is, or an ID to use as a request to get more information.

That is pretty much it! It is fairly straightforward, pretty close to how many people imagined this would work.

Sunday, March 15, 2009

Hiring Developers

I had a fun exchange on Twitter recently with @jamesiry and @codemonkeyism about how to hire developers. I think my thoughts on hiring are a lot different than most folks. So I thought it would be a good topic to expand upon.

First off, I must admit a lot of influence from Steve Yegge on this topic. For me, that was one of those times when you read something and you find yourself constantly thinking "yes exactly, I know what you mean." In other words, it was very consistent with my own experiences and opinions, but better articulated than I had ever been on the topic.

Otherwise my opinions on this matter are partially from my experiences plus a healthy dose of my flavor of logic. I am a big believer in being skeptical when people make statements based on their experiences. This is a common mistake, maybe the most common logical error that people make. Talking about and learning from your experiences is a good thing, but if you want to make a generalization, then you better have a lot of experiences to draw from. That is not even good enough, as your experiences need to be quite varied. Small sample sizes and skewed samples both lead to flawed conclusions.

Ok with all of that out of the way, I think the best approach to hiring developers is to take a very engineering-centric approach. There are things that a (software) engineer does that a scientist does not worry about. An engineer never assumes they will get the best solution on their first try. Instead they plan for mistakes. They plan for ways to measure these mistakes, so they can make adjustments and know if their adjustments are effective. Engineers also plan for failures.

So how does this translate to hiring? First off, you have to realize that a non-trivial fraction of the people you hire will be mistakes. You will get it wrong. In my experience, you get it wrong around 25-50% of the time. That's just my experience, so you should be skeptical about that number. However, it is not the exact number that matters. What matters is that if you think you will rarely get it wrong, that a bad hire is an exceptional event, then you will wind up with a lot of bad developers and it will cripple you.

This assumption can be hard to swallow. A lot of people will just say that this is wrong. However, once you get over this point, then everything else is downhill. If you know that you will make bad hires, then you will instrument your system (in this case your development organization) to check for these events and react to them. You quickly realize that it is key to make this assessment quickly. Do it in three months if you can, six months at worst. Assign a mentor to new hires, carefully pick their first project, whatever. Remember that a new hire is worthless for several months anyways, so the only important thing they can do during that time is prove they are a good hire, not a bad one. If you hire somebody because of an immediate need, you are running your organization poorly -- bring in a hired gun (contractor.) Anyways, it is also very important to make sure the new hire knows that this assessment is going on and that possible results include being laid off or a different position.

So all of this is the end goal, if you will. It is all after the hire. What interest most people is the before the hire process. Again with all of the above in mind, hiring is straightforward.

1.) Attract talent. This is hard. You have to get people to want to work for you. Maybe you hire recruiters. Maybe you use word of mouth. Hopefully you will try to create a good environment and culture for developers. As much as he annoys me, I would still recommend reading Joel Spolsky's ideas on this topic. I think Joel is totally wrong when it comes to the hiring process (actually on most things), but that he gets a lot right when it comes to creating a good environment for developers.

2.) Filter. This is easy. Resumes are the standard result of attracting talent. You can do dumb filtering, like insist on some type of degree, some number of years of experience, some knowledge of a particular technology. Or you can do something more subjective, like look for people who went to certain schools, worked for certain companies, or in certain environments like startups, consulting, enterprise software, whatever. Filtering can get more elaborate. Maybe you have a phone interview and ask some easy questions that you are sure any decent developer will know. Or you send them a similarly easy test of some sort. I am not saying that these are all good ways of filtering, just that they are options.

2a.) But what about traditional interviews? I am not a believer in the traditional, technical interview. It is a huge waste of time. Most people like to give candidates some tough programming or logic problem. Or maybe they go for the open ended design question or puzzle. Here is where I most concur with Stevey Y. All these typical interview techniques do is identify people who think like you. If you are really smart, then maybe this is good, but there is a good chance that you are not really smart. We all think we are smart, it is impossible to be objective about this. Microsoft and Google are both famous for their tough interview questions, but it is ridiculous to think that they are unique in the type of questions they ask of candidates. It is just the opposite. Every software company asks the same kind of questions and they still make a lot of bad hires.

3.) Personality Litmus Test. This is also easy. The one thing you can usually accurately ascertain from an interview is "will I be able to get along with this person?" and "will my team get along with this person?" If the answer is no, then do not hire them. You might think "but it is ok to hire a really smart person, even if they are hard to get along with" but then you are forgetting that you are not very good at determining if a person is smart or not. So just take that out of the equation. Remember the bad apple studies. If you think there is even a shot that somebody is a bad apple, cut the interview short and send them home. Note: I am assuming here that you are not a bad apple, hence that you would not get along with one of these types.

And that's it! If the candidate passes your filters and you think you can work with him/her, then make an offer. Remember that there is a good chance you are getting this wrong, and plan accordingly.

Monday, March 09, 2009

Covariance, Contravariants, Invariants, Help!

The other day, I made a really dumb Tweet about Java collections. I tried to come up with an excuse for it, but there really was none. This was, um, inspired by a proposal to simplify bounded wildcard syntax in Java. Perhaps my mistake is proof that such a simplification is needed, but probably not. To make amends for my mistake, and in the hope that others will be wiser than I, here is another way of looking at this issue.

Collections in Java are invariant, by default. That means that if you define a method to take a List of X, then you have to provide a List of X. List of Y, where Y is a subclass of X, will not work. Here is an example:


public static void invariance(List<Number> nums){
    nums.add(1.1D);                              
    for (Number n : nums){                       
        System.out.println(n.doubleValue());     
    }                                            
}

Invariance implies implies problems with the following code:


List<Integer> ints = Arrays.asList(1,2,3);
invariance(objs); // won't compile
List<Object> objs = new ArrayList<Object>(ints);
invariance(ints); // won't compile

You have go to match things exactly if you want to call an invariant method. Sometimes this is too restrictive, but not in this case. In the example above, you both read and write from the collection. If we could pass in the List of Integers, then adding a double to it would be invalid. If we could pass in a List of Objects, then calling the doubleValue method would be invalid. The only option here is invariance.
If you only need to read from the collection, then you can have covariance:


public static void covariance(List<? extends Number> nums){
    for (Number n : nums){
        System.out.println(n.doubleValue());
    }
}

Now you can pass in anything that adheres to the API of Number, i.e. anything that is a subclass of Number.


List<Integer> ints = Arrays.asList(1,2,3);
covariance(ints); // compiles!
List<Object> objs = new ArrayList<Object>(ints);
covariance(objs); // won't compile

Not that you still cannot pass in the List of Objects. They don't have a doubleValue method, so it would be trouble if the compiler let them in. Now in the covariant method, if you tried to add a double to the list, the compile will not let you. You might protest and say "a Double is a subclass of Number, so I should be able to add it to a collection of objects that are a subclass of Number." This is shortsighted. You don't know exactly what kind of collection was passed in, just that everything in the collection should implement a particular interface. It could be a List of Integer or a List of Double or a List of Byte. So you cannot just add to the collection. For covariance, you want methods that do not mutate the state of the collection. Now technically, you could remove objects from the collection and the compiler will let you get away with it. Still it might helpful to think of covariance as read-only.

What if you want to add things to the collection? That is where contravariance comes in. Here is an example:


public static void contravariance(List<? super Number> nums){
    nums.add(1.1D);
}

Here I say that want anything that is a superclass of Number. In other words, I am willing to say that I cannot be anymore specific about the elements in the collection than they are Numbers.


List<Object> objs = new ArrayList<Object>(ints);
contravariance(objs); // compiles!
List<Integer> ints = Arrays.asList(1,2,3);
contravariance(ints); // won't compile

Now I can send in a List of Objects, and since Object is a superclass of Number, the compiler is happy. If I send in a List of Integer, the compiler will stop me. I could try a List of Double too, and it will still stop me. My collection needs to very permissive, so that my contravariant method can add to it.

Neal Gafter's proposal to simplify some of this, looks like:


public static void covariance(List<out Number> nums){
    for (Number n : nums){
        System.out.println(n.doubleValue());
    }
}

public static void contravariance(List<in Number> nums){
    nums.add(1.1D);
}

I think this is intuitive. If I only want to take things out, I want covariance. If I want to put things in, I want contravariance. One of the things that made me re-think (and maybe confuse me) this subject was how things work in Scala. There you use + and - for covariance and contravariance, respectively. By default, a List in Scala is covariant. However, Lists are immutable. Adding to a Scala List produces a new List, it does not mutate the original List. The mutable cousin of List, like ArrayBuffer, is invariant, just like a List in Java.

Monday, February 09, 2009

Give me the Cache

Web scale development is cache development. Oversimplification? Yes, but it is still true. Figuring out what to cache, how to cache it, and (most importantly) how to manage that cache is one the more difficult things about web application development on a large scale. Most other things have been solved before, but chances are that your caching needs are quite unique to your application. You are probably not going to be able to use Google to answer this kind of design question. Unless of course your application is extremely similar to the one I am about to describe.

I had an application that performs a lot of reads, in the neighborhood of 1000 per second. The application's data needed to be modified very infrequently, less than 100 times per day. Ah, perfect caching! Indeed. In fact the cache could even be very simply: a list of value objects. Not only that, but it's a very small list. It gets even better, the application has a pretty high tolerance for cache staleness. So it's ok if it takes a few minutes for an update to show up. A few different caching strategies emerged out of this too-good-to-be-true scenario. So it was time to write some code and do some testing on these strategies. First, here is an interface for the cache.


public interface EntityCache {
List<Entity> getData();
void setData(Collection<? extends Entity> data);
}

Again, the Entity class is just a value object (Java bean) with a bunch of fields. So first, let's look at the most naive approach to this cache, a class I called BuggyCache.


public class BuggyCache implements EntityCache{
private List<Entity> data;

public List<Entity> getData() {
  return data;
}

public void setData(Collection<? extends Entity> data) {
  this.data = new ArrayList<Entity>(data);
}
}

So why did I call this buggy? It's not thread safe. Just imagine what happens if Thread A calls getData(), and then is in the middle of iterating over the data, and Thread B calls setData. In this application, I could guarantee that the readers and writers would be from different threads, so it was just a matter of time before the above scenario would happen. Hello race condition. Ok, so here was a much nicer thread safe approach.


public class SafeCache implements EntityCache{
private List<Entity> data = new ArrayList<Entity>();
private ReadWriteLock lock = new ReentrantReadWriteLock();
public List<Entity> getData() {
   lock.readLock().lock();
   List<Entity> copy = new ArrayList<Entity>(data);
   lock.readLock().unlock();
   return copy;
}

public void setData(Collection<? extends Entity> data) {
   lock.writeLock().lock();
   this.data = new ArrayList<Entity>(data);
   lock.writeLock().unlock();
}
}

This makes use of Java 5's ReadWriteLock. In fact, it's a perfect use case for it. Multiple readers can get the read lock, with no contention. However, once a writer gets the write lock, everybody is blocked until the writer is done. Next I wrote a little class to compare the performance of the Safe and Buggy caches.


public class Racer {
static boolean GO = true;
static int NUM_THREADS = 1000;
static int DATA_SET_SIZE = 20;
static long THREE_MINUTES = 3*60*1000L;
static long THIRTY_SECONDS = 30*1000L;
static long TEN_SECONDS = 10*1000L;
static long HALF_SECOND = 500L;

public static void main(String[] args) throws InterruptedException {
    final AtomicInteger updateCount = new AtomicInteger(0);
    final AtomicInteger readCount = new AtomicInteger(0);
    final EntityCache cache = new SafeCache();
    long startTime = System.currentTimeMillis();
    long stopTime = startTime + THREE_MINUTES;
    Thread updater = new Thread(new Runnable(){
        public void run() {
            while (GO){
                int batchNum = updateCount.getAndIncrement();
                List<Entity> data = new ArrayList<Entity>(DATA_SET_SIZE);
                for (int i=0;i<DATA_SET_SIZE;i++){
                    Entity e = Entity.random();
                    e.setId(batchNum);
                    data.add(e);
                }
                cache.setData(data);
                try {
                    Thread.sleep(THIRTY_SECONDS);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    });
    updater.start();
    Thread.sleep(TEN_SECONDS);


    List<Thread> readers = new ArrayList<Thread>(NUM_THREADS);
    for (int i=0;i< NUM_THREADS;i++){
        Thread reader = new Thread(new Runnable(){
            public void run() {
                while (GO){
                    int readNum = readCount.getAndIncrement();
                    List<Entity> data = cache.getData();
                    assert(data.size() == DATA_SET_SIZE);
                    for (Entity e : data){
                        System.out.println(Thread.currentThread().getName() +
                                "Read #" + readNum + " data=" + e.toString());
                    }
                    try {
                        Thread.sleep(HALF_SECOND);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        });
        readers.add(reader);
        reader.start();
    }

    while (System.currentTimeMillis() < stopTime){
        Thread.sleep(TEN_SECONDS);
    }
    GO = false;
    for (Thread t : readers){
        t.join();
    }
    long duration = System.currentTimeMillis() - startTime;
    updater.join();
    int updates = updateCount.get();
    int reads = readCount.get();
    System.out.println("Duration=" + duration);
    System.out.println("Number of updates=" + updates);
    System.out.println("Number of reads=" + reads);
    System.exit(0);
}
}

This class creates a writer thread. This thread updates the cache every 30 seconds. It then creates 1000 reader threads. These read the cache, iterate and print its data and pause for half a second. This goes on for some period of time (3 minutes above), and the number of reads and writes is recorded.

Testing the BuggyCache vs. the SafeCache, I saw a 3-7% drop in throughput from using the SafeCache. Actually this was somewhat proportional to the size of the data (the DATA_SET_SIZE variable.) If you made it bigger, you saw a bigger hit as the reading/writing took longer and there was more contention.

So this seemed pretty acceptable to me. In this situation, even a 7% performance hit for the sake of correctness was worth it. However, another approach to this problem came to mind. I like to call it a watermark pattern, but I called the cache LeakyCache. Take a look to see why.


public class LeakyCache implements EntityCache{
  private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
  private final AtomicInteger index = new AtomicInteger(0);

  public List<Entity> getData() {
      List<Entity> current =  data.subList(index.get(), data.size());
      return current;
  }

  public void setData(Collection<? extends Entity> newData) {
      int oldSize = this.data.size();
      this.data.addAll(newData);
      if (oldSize > 0){
          index.set(oldSize + newData.size());
      }
  }
}

The idea here is to keep one ever growing list for the cache and index (or watermark) to know where the current cache starts. Every time you "replace" the cache, you simply add to the end of the list and adjust the watermark. When you read from the cache, you simply copy from the watermark on. I used an AtomicInteger for the index. I probably did not need to, and a primitive int would have been good enough. I used a CopyOnWriteArray for the cache's list. You definitely need this. Without it you will wind up with ConcurrentModificationExceptions when you start mutating the cache with one thread, while another thread is iterating over it.

So you probably see why this is called LeakyCache. That internal list will grow forever. Well at least until it eats all of your heap. So that's bad. It also seems a bit more complicated than the other caches. However, it is thread safe and its performance is fantastic. How good? Even better than the BuggyCache, actually 3x as good as the BuggyCache. That deserves some qualification. Its througput was consistently more than 3x the throughput of the other caches, but I didn't run any long tests on it. It would eventually suffer from more frequent garbage collection as it leaks memory. However, if your updating is not too frequent, the entities are relatively small, and you've got lots of memory, then maybe you don't care and can just recycle your app once a month or so?

Maybe you aren't satisfied with that last statement. You're going to force me to fix that memory leak, I knew it. Here is a modified version that does just that.


public class LeakyCache implements EntityCache{
   private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
   private final AtomicInteger index = new AtomicInteger(0);
   private final ReadWriteLock lock = new ReentrantReadWriteLock();

   public LeakyCache(){
       Thread cleaner = new Thread(new Runnable(){
           public void run() {
               while(true){
                   try {
                       Thread.sleep(60*60*1000L);
                   } catch (InterruptedException e) {
                       e.printStackTrace();
                   }
                   lock.writeLock().lock();
                   if (data.size() > 500000){
                       List<Entity> current = new ArrayList<Entity>(getData());
                       index.set(0);
                       data.clear();
                       data.addAll(current);
                   }
                   lock.writeLock().unlock();
               }
           }
       });
       cleaner.start();
   }
   public List<Entity> getData() {
       lock.readLock().lock();
       List<Entity> current =  data.subList(index.get(), data.size());
       lock.readLock().unlock();
       return current;
   }

   public void setData(Collection<? extends Entity> newData) {
       lock.readLock().lock();
       int oldSize = this.data.size();
       this.data.addAll(newData);
       if (oldSize > 0){
           index.set(oldSize + newData.size());
       }
       lock.readLock().unlock();
   }
}

So what does this do? In the constructor we spawn Yet Another Thread. This one periodically (once an hour in the example) checks to see how big the cache is. If it is over some limit, it gets the current data, clears the cache, adds the current data back to it, and reset the watermark to 0. It also Stop The World to do this by once again using a ReentrantReadWriteLock. Notice how I have abused the lock by using the read lock for both getting and setting the cache. Why use it for setting? The cleaner thread gets exclusive access to the write lock. It uses it when it is cleaning up. By having the setData method use the read lock, it will be blocked if the cleaner thread is in the middle of a cleanup.

Adding this extra bit of complexity fixes the memory leak, while maintaining thread safety. What about performance? Well the performance is highly configurable depending on how often the cleaner thread runs (well how long it sleeps really) and how big you are willing to let the cache grow before cleaning it up. I put it on same very aggressive settings, and it caused about a 15% hit to the leaky version. The performance is still much better than any of the other versions of the cache.

Next up ... write the cache in Scala using Actors.

Thursday, January 15, 2009

Syntax Matters: JavaFX Script

Last night I finished writing an article on JavaFX. It was a lot of fun, and it has convinced me that JavaFX has a strong future. A lot of its promise comes from its syntax. It is easy to get carried away with Rich Media! and Animation! and FPS! but then you forget that JavaFX is a new programming language on the JVM. Here are the highlights of its syntax.

The declarative constructors: When I first started looking at JavaFX, it looked like MXML but in a JSON format. Looks are deceiving. Essentially JavaFX supports name parameters in its constructors. The naming convention is foo : bar, where foo is the name of the parameter and bar is the value. You can put multiple parameters on one line by using a comma to separate, like in most other languages, or you can put each on its own line and eschew the commas. This leads to the JSON-ish syntax. It is probably more accurate to describe it as JavaScript-ish. This becomes even more apparent when you start nesting constructed objects. It really looks a lot like object literals in JavaScript. What is great is that this is not just something you only use for UI elements, like MXML. It is part of the language syntax, so you can use this syntax anywhere you want. Imagine being able to write an MXML expression directly inside a block of ActionScript code...

Speaking of JavaScript: A lot of the syntax of JavaFX looks like JavaScript's more sophisticated cousin, ActionScript 3.0. You use var to declare local variables. You put the type after the variable name, separated with a colon. You use function to define methods. The major difference is using def to denote a constant. In ActionScript, you use const, like you would in C. This is actually kind of bad in my opinion. The def keyword is used in many other languages, like Ruby and Scala, to denote a function. There is no obvious connection between def and a constant. I think they should have used const instead of def. It would have made more sense to do that and maybe use def instead of function.

Functional Programming: A lot of folks are disappointed that closures are not being added to Java, at least not in Java 7. JavaFX does have closures. You can define a var to have a function type and can specify the signature of that function. The syntax is pretty nice. For example, you could do var join:function(String[], String):String to define a variable called join that is a function that takes in an array of strings and a string as input parameters and returns a string. I would like to see this is in ActionScript. JavaFX also has support for list comprehensions. You could do var squares = for (i in [1..100]) { i*i} to get an array of the perfect sqaures up to 10,000. However, JavaFX does not make as much use of closures. You would think that its sequences would have common functional mehtods like filter, map, fold, etc. For filter and map, there are alternatives. For example, let's say you wanted the square of all of the odd integers less than 100. In Scala you would do something like val os = (1 until 100).filter(_ % 2 == 1).map(_^2) . In JavaFX it would be val os = for (x in [1..10][i | i % 2 == 1]){ x*x }. It's a case of syntax over API. The second set of curly braces is like a select clause. I want to like it because it uses mathematics insired symbols.

Other: There are a few other JavaFX syntax bits worth mentioning. First is bind and bound. These are for data binding expressions. These can be very powerful. You can bind variables together, so that the one changes when the other changes. Better is that you can bind to functions and list comprehensions. The other interesting syntax in JavaFX involve the keywords delete and insert. These give LINQ-ish syntax for sequences. In fact if you combine the mathematical style select syntax with insert/delete and with the declarative constructors, you get expressiveness that is pretty on-par with LINQ in my opinion. When you see everything working together, it kind of makes sense but it does seem kind of random at first.

Tuesday, December 16, 2008

Is Scala too hard or are you just stupid?

In case you missed it, Python's BDFL a.k.a. Guido van Rossum filed some major complaints about Scala. After you read that, read a good rebuttal and especially observe the ensuing debate. I especially liked Tony Morris's closing salvo. He brings some validity to the ad homenim response: Maybe you don't like Scala because you are too dumb to understand it. Please allow me to commit the most common fallacy of human thought and attempt to answer this question by only examining my own experience. In other words, if Scala seems too hard to me, then it is too hard. Period. If it seems simple enough to me, then anybody who finds it hard is just stupid.

Since I am going to make a statement of fact based entirely on my own experiences, maybe a little background would be relevant. I started programming 27 years ago when I was 7 years old. I programmed in BASIC, first on an Apple II at school and then on my very own Commodore 64. My first "real programming" was in Pascal when taking Advanced Placement Computer Science in high school. I probably still think in Pascal and translate everything into it first. I also learned FORTRAN and C while in high school.

In college, I learned a lot more C. For awhile I double-majored in math and computer science. Then I took the CS 10, the weed-out course for CS majors. For all of our programs in that class, we started with a predicate calculus definition of the problem, and then we had to logically derive the solution. We had a tool (I think the prof or somebody else in the department wrote) that we transform the problem statement to a computer program. But not just any program, a Lisp program. I hated everything about this, including Lisp. That class successfully weeded me out, and stuck to math.

In college I also learned Perl (for an economics class) and C++ (for a summer math research project.) After college, I programmed mostly in Java, with some C++, Perl, Ruby, and C# sprinkled in along the way. So in summary, I've programmed in a lot of language, but most of them are from the imperative cum OOP tree with C based syntax. I had one major experience with a functional language, and hated it so much that I change my major. So now on to my observations and emotions about Scala.

1.) Type inference sucks, but you (Java developers) gotta love it. I love having a statically typed language. I am definitely in that camp. However, I love not having to declare types very often. How often do you get to have your cake and eat it too? But this is, in my experience, the major source of pain in Scala. You are at the mercy of the cleverness of the compiler. In Java you are often at the mercy of the cleverness of the JVM, but it requires some esoteric Java (the kind that you only see in job interview problems or in books by Josh Bloch and Neal Gafter) to produce code that looks like it should compile, but will not. This is all too common in Scala. Here is an example.


// this compiles
object ItemMetaData extends Item with KeyedMetaMapper[Long, Item]{
   override def dbTableName = "items"
  // here is the problem line
   override def fieldOrder = List(name, description, reserve, expiration)
}


class Item extends KeyedMapper[Long, Item]{
 def getSingleton = ItemMetaData
 def primaryKeyField = id

 object id extends MappedLongIndex(this)
 object reserve extends MappedInt(this)
 object name extends MappedString(this, 100)
 object description extends MappedText(this)
 object expiration extends MappedDateTime(this)
}
// but this does not
object ItemMetaData extends Item with KeyedMetaMapper[Long, Item]{
   override def dbTableName = "items"
   // here is the problem line
   override def fieldOrder = name :: description :: reserve :: expiration :: Nil
}


class Item extends KeyedMapper[Long, Item]{
 def getSingleton = ItemMetaData
 def primaryKeyField = id

 object id extends MappedLongIndex(this)
 object reserve extends MappedInt(this)
 object name extends MappedString(this, 100)
 object description extends MappedText(this)
 object expiration extends MappedDateTime(this)
}

The two list expressions are equivalent, and yet one compiles and the other does not. Why? Is it a flaw in the compiler? Is it bad syntax? Is it a flaw in the language implementation (i.e. the List code and the :: code) ?

2.) Having no operators means that operators are everywhere! You can make the case that Scala's syntax is simpler than Java, C++, Ruby, or C++ for that matter. Why? Because it has no operators. There is no operator overloading, because there are no operators. The flip side of this is that it is easy to simulate operators and that everybody can do it. This is great for the wanna-be language designers in all of us, but can really suck. Why? It's like you are constantly encountering new operators. It can make Scala's syntax feel infinite is size, even though it is actually quite small and simple.

3.) Golf is built in and is known as _. Scala uses the underscore all over the place to allow for shorthand. Once you have become reasonably competent at Scala, you begin to like this. Until then, you hate it. In other words, it steepens the learning curve. It is a feature for power users, but it is prevalent. Thus you either become a power user, or you quit (insert snide comment about Guido or Cedric here.)

4.) Expressive is as expressive does. A lot of folks have thrown the term "readability" around. There are two aspects of readability. When you read code, do you know what it does? When you read code, do you know how it does it? The more expressive a language is, the easier it is for it to satisfy the first question. However, it may be harder for it to satisfy the second question. One could argue that if a language is better at both of these things than an alternative language, then you would probably rather program in that language. It is certainly easier to tell what idiomatic Scala will do than Java, for example. However, it is often harder to understand how it does it. You can substitute Python for Scala in the above two sentences and it is still true. But could you substitute Python for Java in those sentences? Probably not.

5.) Scala is not for creationists. Ok maybe that's a bad joke by an atheistic evolutionist. What I mean is that object creation can be confusing in Scala. Constructors are not obvious at first. Case classes add confusion, and then you throw in companion objects and you might start screaming "Make it stop!" Oh do I complain too much? Well in most of the other "mainstream" languages you need to learn exactly one thing to figure out object construction. Sure there may be other patterns (factories, singletons, builders, etc.) for augmenting this, but there is only "natural" way instantiate a class. In Scala you see standard stuff like val foo = new Foo("2ez") but also val foo = Foo("wtf"). The latter could be a case class, in which case the former won't compile. Or it could be a companion object, in which case both things compile.

So what is the conclusion? Well for all of my struggles, I have managed to find my way around Scala. So there, it must not be too hard. I am no smarter than any other programmer out there, so if I can learn it, they can too. Plus, my job did not even depend on it. Usually when a programmer has to learn a language, their job depends on it. A motivated programmer can definitely learn Scala is no time!

Monday, September 08, 2008

Scala ArrayStack

I had not done any Project Euler problems for awhile, so I decided to solve one yesterday. I was also planning on attending the next BASE meeting, so I wanted to brush up my Scala. Thus it was time to solve Problem #47 in Scala.

The solution got me a little more familiar with some of the data structures available in scala.collection.mutable. In particular I needed a structure to hold a list of factors. I decided that ArrayStack was the best choice. Here is my solution:


package probs
import scala.collection.mutable.ArrayStack

object Euler47 {
  def main(args : Array[String]) : Unit = {
    val start = System.nanoTime
    solve(4)
    val duration = System.nanoTime - start
    println("duration=" + duration/1000000.0)
  }

  def solve(n:Int):Unit = {
    var i = 2
    while (i > 0){
      var j = i
      while (j<i+n && numFactors(j) ==n){
        j += 1
      }
      if (j-i == n){
        val msg = (i until j).foldLeft(""){(x,y) => x + y + " "}
        println(msg)
        return
      }
      i += 1
    }
  }
  
  def numFactors(n:Int):Int = {
    var factors = new ArrayStack[Int]
    var i=2
    var m = n
    while (i <= m/i){
      while (m % i ==0){
        if (factors.size ==0  || i != factors.peek){
          factors += i
        }  
        m /= i
      }
      i += 1
    }
    if (m != 1){
      factors += m
    }
    factors.size
  }
}

I was very pleased with the performance, solving the problem in about 0.4 seconds on my MacBook. I saw a similar, but not as good Java solution on the message boards that ran in 1.5 seconds. That solution added all of the factors repeated times and then had to loop through them again to get rid of duplicates. I ran it on my MacBook and it ran in 1.1 seconds. Even when I "fixed" it, it still took about one second. I am sure I could have done a lot of work to it and got it as fast as Scala, but why bother.

Wednesday, July 30, 2008

Python Threads

I finished Core Python. My last post on it hits on a few random things towards the end of the book. The most interesting is threading in Python and begs the question, is Python more or less multi-threaded than it claims?

Sunday, July 13, 2008

Google App Engine, Python, and The Gator

Today I finished the part three of a series of articles on the Google App Engine. Of course right now, GAE only supports Python. So I've been polishing my Python at the same time by reading Core Python. The end result of all this is not only the three part series, but a FriendFeed clone called the AggroGator. Oh, and I've formed some opinions as well!

First, GAE. From a developer standpoint, it is excellent. The SDK works well. It is easy to get develop, test, and deploy. I used Eclipse and PyDev, and I was very impressed with it. PyDev does a good job of providing code assist and syntax checking, something greatly appreciated by a statically typed language guy like myself.

Now once you get beyond the development phase, GAE has had some notable problems. I have been subscribed to the mailing list, and it is very active with people having problems. Personally, I had problems with urlfetch returning HTTP 503's when I could go to the same URL and get a nice little 200. This didi give me the pleasure of using GAE's logging feature to identify these errors, but there was nothing to be done about them. The 503's have since gone away, but I was just glad that this wasn't a "real" app that my job depended on.

A big part of the fun of using GAE can accredited to Python. It is a nice language. It is definitely not as elegant as Ruby or Scala. It does read nicely and has a lot of very practical sugar to it. I think it was a good choice by Google to use for GAE. I don't think they use it much for their high profile applications (I think Google Code is written in Python, but not certain.) Python is known to have better performance than Ruby, but I don't think it has nearly the popular as the other common P in LAMP (PHP.)