Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

Tuesday, September 15, 2009

Scala Word Clouds

Just before I went to bed last night, I got an interesting tweet from Dave Briccetti. You can see from the link in tweet what the problem was, and if you look at the bottom of the page, you can see the clever solution by Jorge Ortiz. Right after I read the tweet from Dave, I asked my wife how long until the end of the TV that she was watching. She said seven minutes. I decided that wasn't enough time, so I went to sleep.

I didn't look at the problem again until today during lunch. By then, I saw Jorge's solution and I saw no way to improve on it. However, this inspired a little experiment for me. Actually it mostly reminded me of a programming problem that I was given a few years back by a well known company that will remain anonymous as they are very touchy about people talking about their interviewing process. The problem was essentially the same as Dave's problem -- take a bunch of sentences (instead of tweets), figure out the most frequent words. There was a twist though -- I was not allowed to use any Java collection. Actually I could not use anything outside of the java.lang package, so I was pretty much stuck with arrays...

At the time, I figured that I would have to either implement my own hash table and a sort, but then I came up with a way to solve the problem with no hash table. So when I saw Dave's problem, I thought "wow my old solution will be so much more awesome in Scala":

def createCloud(strings:Seq[String])={
  val x:List[String] = Nil
  var s = ("",0)
  val cloud = new scala.collection.mutable.ArrayBuffer[Tuple2[String,Int]]
  strings.map(_.split("\\s").toList.sort(_ < _)).foldLeft(x){mergeSorted(_, _)}.foreach( (a) => {
    if (a == s._1) 
      cloud(cloud.length - 1) = (a, s._2 + 1)
    else
      cloud += (a,1)
    s = cloud(cloud.length - 1)
  })
  cloud.toList.sort((a,b) => a._2 > b._2)
}
Ok, so I cheated this time and used an ArrayBuffer instead of array. In my solution years ago, I made an array whose size was the total number of words (worst case scenario.) I didn't have tuples, so I had to use two arrays. Anyways, the idea here is to split each sentence into a list of words, and then sort that list of words. Now merge together each sorted list, and keep things sorted using a merge sort. Of course you could turn everything into a big list first using flatMap, similar to Jorge's solution, and then sort. Anyways, you then roll up the sorted list, taking advantage of the fact that it is sorted. You never have to look up a word to see if it has been encountered before, because of the sorting. You just keep track of the last word encountered, and just compare the next word to the last word to determine if the word has been previously encountered and thus needs to be incremented. Of course you must be wondering about the aforementioned merge sort. Here is a Scala implementation of it:
def mergeSorted[A](comparator: (A,A) => Boolean)(x:List[A], y:List[A]):List[A] = {
  if (x.isEmpty) y
  else if (y.isEmpty) x
  else if (comparator(x.head, y.head)) x.head :: mergeSorted(comparator)(x.tail, y)
  else y.head :: mergeSorted(comparator)(x, y.tail)
}
  
def mergeSorted[A <% Ordered[A]](x:List[A], y:List[A]):List[A] = mergeSorted((a:A,b:A) => a < b)(x,y)
While writing this, I decided that I should touch up my knowledge of currying and partially applied functions in Scala, so I read about this on Daniel Spiewak's blog. The first version of mergeSorted is very generic and takes a comparator. This is what lead me into the world of currying. Actually, my first version of the above looked like this:
class MergeSorter[A](comparator: (A,A) => Boolean){
  def apply(x:List[A], y:List[A]):List[A] = {
    if (x.isEmpty) y
    else if (y.isEmpty) x
    else if (comparator(x.head, y.head)) x.head :: apply(x.tail, y)
    else y.head :: apply(x, y.tail)    
  }
}
I created a class to encapsulate the sort. In some ways I like this syntax a little better, but it made the second version of the sort a little uglier. I wanted a "default" sort that could be used for anything implementing Scala's Ordered trait. It is this second version that I used in the createCloud function above, as strings can be implicitly converted to RichString which implements Ordered.

It is still easy to use mergeSorted with objects that do not implement Ordered, or to provide a different sort function for those that do:
// provide your own sort
val sList = 
  mergeSorted((s1:String, s2:String) => s1.toLowerCase < s2.toLowerCase)(List("aa","aB","Cb"), List("Ab", "ca"))
// create a reusable sort
val listSort = mergeSorted( (a:List[Any], b:List[Any]) => a.length < b.length) _
val z = listSort(List(List("a","b"), List(1,2,3)), List(List(true,true,true), List("x","y","z","w")))
In the first example, we create a new sort by providing different sorting function that is case insensitive. It is then applied to a list of strings. In the second example, we create the sorting function and assign to a val. Notice the mysterious underscore _ at the end of the statement. This creates a partially applied function. If you forget the underscore, the Scala compiler actually provides a pretty useful error message about this. With this, we can then apply the function however we want.

Wednesday, June 11, 2008

BASE Meeting

Last night I attended my first BASE (Bay Area Scala Enthusiasts) meeting. Organizer Dick Wall had a nice problem to solve for the attendees. The problem was to take a list of integers, and replace each element in the list with the product of all the other elements in the list. For example, if your input was (2,4,6,8) the output should be (4*6*8, 2*6*8, 2*4*8, 2*4*6) = (192, 96, 64, 48). If you want to solve the problem yourself, stop reading and come back when you are ready to read about the solutions.

There is an "obvious" brute force solution, which would be to implement the algorithm exactly as the problem is stated. This is an O(N^2) solution, not so good. You can do better by simply taking a product of all of the elements of the original list, and then replacing each member of the list with the product divided by the element. So for the example above, the product = 2*4*6*8 = 384, and thus the output should be (384/2, 384/4 , 384/6, 384/8). This is O(N) solution, and that is much better. This is the best you can do in terms of speed and space, as you only keep around one extra integer in memory (the product.) However, it doesn't always work.

The problem is when there is a zero in the list. Then the above algorithm will lead to division by zero errors. Dick pointed this out, and pointed out that there are really three casses. If there is exactly one zero, then every element in the return list will be zero, except for the one at the same index as the lone zero from the input list. If there are two or more zeroes, then every element in the return list will be zero. With that in mind, here was my solution:

def wallify(list:List[Int]):List[BigInt] ={
val one = BigInt.int2bigInt(1)
val noZeroes = list.filter(_ != 0)
if (noZeroes.length == list.length){
val prod = list.foldLeft(one){(p,e) => p*BigInt.int2bigInt(e)}
list.map(prod/_)
} else if (list.length - noZeroes.length == 1){
list.map((el) =>
if (el == 0)
noZeroes.foldLeft(one){(p,e) => p*BigInt.int2bigInt(e)}
else
0
)
} else {
list.map((el) => 0)
}
}

I had to leave early, and didn't finish my code until today. Here is the code that the group, led by Bill Venners:

def wallify(input: List[Int]): List[Long] = {
def boringCalcProduct(longInput: List[Long]): Long = {
var product = 1L
for (ele <- longInput) {
product *= ele
}
product
}

def calcProduct(input: List[Long]): Long =
// input.foldLeft(1L)(_ * _)
(1L /: input)(_ * _)

val zeroCount = input.filter(_ == 0).size
val longInput: List[Long] = input.map(ele => ele.toLong)
zeroCount match {
case 0 =>
val product = calcProduct(longInput)
longInput.map(product / _)
case 1 =>
val noZero = longInput.filter(_ != 0)
val product = calcProduct(noZero)
longInput.map(ele => if (ele == 0) product else 0)
case _ =>
input.map(ele => 0L)
}
}