Tuesday, September 15, 2009

Scala Word Clouds

Just before I went to bed last night, I got an interesting tweet from Dave Briccetti. You can see from the link in tweet what the problem was, and if you look at the bottom of the page, you can see the clever solution by Jorge Ortiz. Right after I read the tweet from Dave, I asked my wife how long until the end of the TV that she was watching. She said seven minutes. I decided that wasn't enough time, so I went to sleep.

I didn't look at the problem again until today during lunch. By then, I saw Jorge's solution and I saw no way to improve on it. However, this inspired a little experiment for me. Actually it mostly reminded me of a programming problem that I was given a few years back by a well known company that will remain anonymous as they are very touchy about people talking about their interviewing process. The problem was essentially the same as Dave's problem -- take a bunch of sentences (instead of tweets), figure out the most frequent words. There was a twist though -- I was not allowed to use any Java collection. Actually I could not use anything outside of the java.lang package, so I was pretty much stuck with arrays...

At the time, I figured that I would have to either implement my own hash table and a sort, but then I came up with a way to solve the problem with no hash table. So when I saw Dave's problem, I thought "wow my old solution will be so much more awesome in Scala":

def createCloud(strings:Seq[String])={
  val x:List[String] = Nil
  var s = ("",0)
  val cloud = new scala.collection.mutable.ArrayBuffer[Tuple2[String,Int]]
  strings.map(_.split("\\s").toList.sort(_ < _)).foldLeft(x){mergeSorted(_, _)}.foreach( (a) => {
    if (a == s._1) 
      cloud(cloud.length - 1) = (a, s._2 + 1)
    else
      cloud += (a,1)
    s = cloud(cloud.length - 1)
  })
  cloud.toList.sort((a,b) => a._2 > b._2)
}
Ok, so I cheated this time and used an ArrayBuffer instead of array. In my solution years ago, I made an array whose size was the total number of words (worst case scenario.) I didn't have tuples, so I had to use two arrays. Anyways, the idea here is to split each sentence into a list of words, and then sort that list of words. Now merge together each sorted list, and keep things sorted using a merge sort. Of course you could turn everything into a big list first using flatMap, similar to Jorge's solution, and then sort. Anyways, you then roll up the sorted list, taking advantage of the fact that it is sorted. You never have to look up a word to see if it has been encountered before, because of the sorting. You just keep track of the last word encountered, and just compare the next word to the last word to determine if the word has been previously encountered and thus needs to be incremented. Of course you must be wondering about the aforementioned merge sort. Here is a Scala implementation of it:
def mergeSorted[A](comparator: (A,A) => Boolean)(x:List[A], y:List[A]):List[A] = {
  if (x.isEmpty) y
  else if (y.isEmpty) x
  else if (comparator(x.head, y.head)) x.head :: mergeSorted(comparator)(x.tail, y)
  else y.head :: mergeSorted(comparator)(x, y.tail)
}
  
def mergeSorted[A <% Ordered[A]](x:List[A], y:List[A]):List[A] = mergeSorted((a:A,b:A) => a < b)(x,y)
While writing this, I decided that I should touch up my knowledge of currying and partially applied functions in Scala, so I read about this on Daniel Spiewak's blog. The first version of mergeSorted is very generic and takes a comparator. This is what lead me into the world of currying. Actually, my first version of the above looked like this:
class MergeSorter[A](comparator: (A,A) => Boolean){
  def apply(x:List[A], y:List[A]):List[A] = {
    if (x.isEmpty) y
    else if (y.isEmpty) x
    else if (comparator(x.head, y.head)) x.head :: apply(x.tail, y)
    else y.head :: apply(x, y.tail)    
  }
}
I created a class to encapsulate the sort. In some ways I like this syntax a little better, but it made the second version of the sort a little uglier. I wanted a "default" sort that could be used for anything implementing Scala's Ordered trait. It is this second version that I used in the createCloud function above, as strings can be implicitly converted to RichString which implements Ordered.

It is still easy to use mergeSorted with objects that do not implement Ordered, or to provide a different sort function for those that do:
// provide your own sort
val sList = 
  mergeSorted((s1:String, s2:String) => s1.toLowerCase < s2.toLowerCase)(List("aa","aB","Cb"), List("Ab", "ca"))
// create a reusable sort
val listSort = mergeSorted( (a:List[Any], b:List[Any]) => a.length < b.length) _
val z = listSort(List(List("a","b"), List(1,2,3)), List(List(true,true,true), List("x","y","z","w")))
In the first example, we create a new sort by providing different sorting function that is case insensitive. It is then applied to a list of strings. In the second example, we create the sorting function and assign to a val. Notice the mysterious underscore _ at the end of the statement. This creates a partially applied function. If you forget the underscore, the Scala compiler actually provides a pretty useful error message about this. With this, we can then apply the function however we want.

Sunday, September 13, 2009

Some Tasty Scala Sugar

I have finally been getting around to working some more on Scala+Android. A lot needs to be done, especially since I am talking about it next month at the Silicon Valley Code Camp. Scala advocates (and I guess that includes me) often say that is very easy for application developers to use Scala, i.e. that "most" developers are never exposed to the more complex aspects of the language. The folks who "suffer" are APIs developers. I've mostly fallen into the first camp (app developers), but now that I'm trying to create a Scala API on top of the Android SDK, I get exposed to more aspects of Scala. Here are a couple of things that I found interesting.

Tuesday, September 01, 2009

Metaprogramming in Groovy and Scala

This morning I read an excellent pair of articles by Scott Davis on metaprogramming in Groovy. It reminded me of some of the different ways to approach this kind of problem. The second article in particular detailed a new feature in Groovy 1.6, the delegate annotation. Here is an example of using this feature:

public class Monkey {
def eatBananas(){
println("gobble gulp")
}
}

public class RoboMonkey{
@Delegate final Monkey monkey
public RoboMonkey(Monkey m){
this.monkey = m
}
public RoboMonkey(){
this.monkey = new Monkey()
}
def crushCars(){
println("smash")
}
}

This allows for the following usage:

def monkey = new RoboMonkey()
monkey.eatBananas()
monkey.crushCars()

What is interesting here is that you cannot just create a Monkey and call crushCars on it. RoboMonkey is meant to be a way to add functionality to the Monkey class (in this case the crushCars method), while still being able to treat the RoboMonkey as if it was a Monkey. I point this out because of how this would be done in Scala:

class Monkey{
def eatBanaas = println("gobble gulp")
}

class RoboMonkey(val monkey:Monkey){
def this() = this(new Monkey)
def crushCars = println("smash")
}

object Monkey{
implicit def makeRobo(monkey:Monkey):RoboMonkey = new RoboMonkey(monkey)
implicit def makeNormal(robo:RoboMonkey):Monkey = robo.monkey
}

Now arguably this is more complicated, because you have to create the Monkey object in addition to the Monkey class. What Scala requires is that you get those implicit functions (makeRobo, makeNormal) in scope. This is just one way to do that. The added benefit is the following usage:

val monkey = new Monkey
monkey.eatBanaas
monkey.crushCars

I guess the Groovy way to do this is to use the meta class:

Monkey.metaClass.crushCars = {-> println("smash")}
def monkey = new Monkey()
monkey.eatBananas()
monkey.crushCars()

In both languages you can accomplish similar things, though with very different ways to implement it. Meta programming in Groovy is very powerful, and there are things you can do in Groovy that you cannot do in Scala -- see methodMissing and propertyMissing. Of course you lose some of the benefits of static typing, but software is all about tradeoffs.