Monday, June 16, 2008

Scala vs. Java Performance

Last week I wrote about the 'wallify' problem that was worked on at the BASE meeting. One thing that was mentioned there was that it seemed like a Scala implementation would not have performance that was as good as in Java. I decided to give this a test. I used the Scala code I wrote last week and compared it to Java code that used essentially the same algorithm.

To test, I created a driver. The driver would vary the numbers in the input list, basically 1..N where N was varied. It would then measure the time it took to wallify the list 100 times. The JVM was the 1.5 JVM for OSX with the -server flag set. I was also giving it 1GB of memory (more on that later.) Here is what it looked like in Java:

And in Scala:

Very similar. There was no noticeable performance loss in Scala. Of course there is an obvious problem here. This algorithm should have been linear with respect to the size of the list, N. This is clearly not linear! I took the logarithms of both the size and the time, and found a correlation of about 0.996. This algorithm is actually quadratic.

I thought that maybe my algorithm just sucked. I tried a couple of other Scala ones. One provided by Jorge Ortiz and one created by the group led by Bill Venners. The both had the exact same performance characteristics as my algorithm, i.e. they were quadratic as well.

The only thing I could think of was garbage collection. I put on gc:verbose and watched a lot of garbage being collected. I upped the heap to 1 GB, but it had no effect. So I am not sure if GC is really the culprit or not.

Update: James Iry had a great insight. Java's BigInteger computation is linear once things get big. To keep them small, you can use all 1's. Here is what the performance looked like then, just for my original Scala code.

Ok, that's not quadratic anymore. I wasn't sure if it was linear (I was paranoid about compiler tricks given the list of 1's that was being multiplied) either, so I actually extended the test with more data points. With a few more data points, I got a nice correlation of 0.993 for a linear fit. This confirms that the algorithm itself is linear for small numbers, but is quadratic for larger numbers where high performance integer math becomes significant.

10 comments:

  1. Anonymous7:12 AM

    Are you using BigInteger to deal with the fact that at N = 40 the value overflows a Java 64bit long with something like:
    2304077777655037952 * 40 =
    -70609262346240000

    and at N = 66 in Java you get something like:
    -9223372036854775808 * 66 = 0

    Also are your timings really in seconds? Because it seems like it ought to be milliseconds.

    ReplyDelete
  2. I updated the blog to include a link to my previous entry as well as Jorge's solution. You will see in my solution that I use Scala's BigInt. For the Java version, I used BigInteger of course. When I tested Jorge's and Bill's solutions, I changed their code to use BigInt instead of Longs.

    And yes, everything is in seconds! Pretty surprising I know. I used System.nanoTime() and divided the result by 10^9.

    ReplyDelete
  3. The core of your algorithm is O(n), but I think there's another O(n) hidden in the use of BigInts. Multiplying or dividing BigInts is O(n) in the number of bits needed to represent the numbers.

    As your lists get bigger, the size of the product tends to get bigger as well - hence the quadratic behavior you're seeing. To see if that's the cause, do your perf testing on a list consisting of nothing but 1s.

    ReplyDelete
  4. Anonymous5:56 AM

    James is right. If you want to isolate the performance of the algo you need to avoid simultaneously scaling the list size and workload of multiplying and dividing numbers with 16,000 digits (at N=5000)

    ReplyDelete
  5. Thanks for this write-up! Hopefully it will help dispel various myths about the performance of functional algorithms.

    ReplyDelete
  6. Anonymous4:34 PM

    The numbers are meaningless as you don't provide the java and scala source code used for the test.

    ReplyDelete
  7. Looks like Anon busted me as part of the conspiracy to mislead the public into thinking that Scala performs as well as Java. So here is the Java code that mimics the Scala algorithm:
    public static List<BigInteger> wallify(final List<Integer> list){
    List<BigInteger> result = new ArrayList<BigInteger>(list.size());
    List<Integer> noZeroes = new ArrayList<Integer>(list.size());
    for (Integer i : list){
    if (i != 0){
    noZeroes.add(i);
    }
    }
    if (noZeroes.size() == list.size()){
    BigInteger product = BigInteger.ONE;
    for (Integer i : list){
    product = product.multiply(BigInteger.valueOf(i.longValue()));
    }
    for (Integer i : list){
    result.add(product.divide(BigInteger.valueOf(i.longValue())));
    }
    } else if (list.size() - noZeroes.size() == 1){
    for (Integer i : list){
    BigInteger b = BigInteger.ZERO;
    if (i == 0){
    b = BigInteger.ONE;
    for (Integer j : noZeroes){
    b = b.multiply(BigInteger.valueOf(j.longValue()));
    }
    }
    result.add(b);
    }
    } else {
    for (Integer i : list){
    result.add(BigInteger.ZERO);
    }
    }
    return result;
    }

    Here is the test runner code in Java:

    public static void main(String[] args) {
    // create test data
    int MAX = 1500;
    List<Integer> testList = new ArrayList<Integer>(MAX);
    for (int i=0; i< MAX;i++){
    testList.add(i+1);
    }
    long startTime = System.nanoTime();
    for (int i=0;i<100;i++){
    wallify(testList);
    }
    long duration = System.nanoTime() - startTime;
    System.out.println("Duration= " + duration + " nanoseconds");
    }

    And here is the test runner code in Scala:

    val one = BigInt.int2bigInt(1)
    def main(args : Array[String]) : Unit = {
    val max = 30000
    val testList = List.make(max, 1)
    println(testList)
    val start = System.nanoTime
    for (i <- 0 until 100){
    wallify(testList)
    }
    val duration = System.nanoTime - start
    println("Duration = " + duration)
    }

    In both test runners, the max value is varied to produce the lovely graphs shown above.

    ReplyDelete
  8. I really liked this information, I think it is good to learn about these things, I would like some day to receive any update of this information so interesting

    ReplyDelete
  9. since I was a teenage, I became interested in reading, I think it is good practice, especially when it is a very interesting blog, thanks for sharing the information!

    ReplyDelete
  10. when I was in college I liked to be on the Internet reading information like this, I have always said that it is useful to learn new things that may serve in different circumstances

    ReplyDelete