Thursday, April 30, 2009
Mobile OS Wars: Upgrades
Thursday, April 23, 2009
Tail Recursion in Scala
def factorial(n:BigInt):BigInt={
if (n == 0) 1
else n*factorial(n-1)
}
Why can't this be optimized as a tail call? Instead of explaining it myself, I will introduce the hero of this story, Daniel Spiewak (check out his blog or follow him on Twitter.) Bill emailed Daniel about his suspicion, and Daniel confirmed it :
You're quite correct: this version of factorial is not tail recursive.
I find it sometimes helps to split the last expression up into atomic
operations, much as the Scala parser would:
else {
val t1 = n - 1
val t2 = factorial(t1)
n * t2
}
It is possible to devise a tail-recursive factorial. In fact, it
might be possible to convert any recursive function into a
tail-recursive equivalent, but I don't have a proof for that handy.
Back to factorial:
def factorial(n: Int) = {
def loop(n: Int, acc: Int): Int = if (n <= 0) acc else loop(n - 1, acc * n)
loop(n, 1)
}
I'm cheating a bit by taking advantage of the fact that multiplication
is commutative, but this is the general idea. Some functions are
naturally tail recursive, but for those which are not, this
"accumulator pattern" is a fairly easy way to achieve the equivalent
result. Technically, the accumulator must be combined with an
inversion of the computation direction (bottom-up rather than
top-down) to generate a truly-equivalent tail-recursive function, but
it is not necessary for examples like this one.
This accumulator pattern is a great tip. To drive home the difference this produces in the behavior of the Scala compiler, it is useful to use our good 'ol friend javap. For the original code, here is the output:
public scala.BigInt factorial(scala.BigInt);
Code:
0: aload_1
1: iconst_0
2: invokestatic #27; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
5: invokestatic #31; //Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z
8: ifeq 21
11: getstatic #36; //Field scala/BigInt$.MODULE$:Lscala/BigInt$;
14: iconst_1
15: invokevirtual #40; //Method scala/BigInt$.int2bigInt:(I)Lscala/BigInt;
18: goto 40
21: aload_1
22: aload_0
23: aload_1
24: getstatic #36; //Field scala/BigInt$.MODULE$:Lscala/BigInt$;
27: iconst_1
28: invokevirtual #40; //Method scala/BigInt$.int2bigInt:(I)Lscala/BigInt;
31: invokevirtual #45; //Method scala/BigInt.$minus:(Lscala/BigInt;)Lscala/BigInt;
34: invokevirtual #47; //Method factorial:(Lscala/BigInt;)Lscala/BigInt;
37: invokevirtual #50; //Method scala/BigInt.$times:(Lscala/BigInt;)Lscala/BigInt;
40: areturn
I know this looks scary if you have not used javap much, but if you have then you will recognize that this is a straightforward compilation. You can see the de-sugaring (try scalac -print for a better way to view the de-sugaring that the compiler performs) like the implicit conversions being made (#15 above) and the operators as method invocations (#31 and #37.) Especially important is that you can see the recursive call to factorial on #34, and then a call to BigInt.$times, i.e. multiplication for BigInts.
Now let's rewrite the method using Daniel's suggestions:
def factorial(n:BigInt):BigInt={
def loop(n:BigInt, acc:BigInt):BigInt = if (n <= 0) acc else loop(n-1,acc*n)
loop(n,1)
}
And now when we run javap, things are a lot different (make sure you use the -private flag with javap to get all of the details):
private final scala.BigInt loop$1(scala.BigInt, scala.BigInt);
Code:
0: aload_1
1: getstatic #26; //Field scala/BigInt$.MODULE$:Lscala/BigInt$;
4: iconst_0
5: invokevirtual #30; //Method scala/BigInt$.int2bigInt:(I)Lscala/BigInt;
8: invokevirtual #36; //Method scala/BigInt.$less$eq:(Lscala/BigInt;)Z
11: ifeq 16
14: aload_2
15: areturn
16: aload_1
17: getstatic #26; //Field scala/BigInt$.MODULE$:Lscala/BigInt$;
20: iconst_1
21: invokevirtual #30; //Method scala/BigInt$.int2bigInt:(I)Lscala/BigInt;
24: invokevirtual #40; //Method scala/BigInt.$minus:(Lscala/BigInt;)Lscala/BigInt;
27: aload_2
28: aload_1
29: invokevirtual #43; //Method scala/BigInt.$times:(Lscala/BigInt;)Lscala/BigInt;
32: astore_2
33: astore_1
34: goto 0
public scala.BigInt factorial(scala.BigInt);
Code:
0: aload_0
1: aload_1
2: getstatic #26; //Field scala/BigInt$.MODULE$:Lscala/BigInt$;
5: iconst_1
6: invokevirtual #30; //Method scala/BigInt$.int2bigInt:(I)Lscala/BigInt;
9: invokespecial #51; //Method loop$1:(Lscala/BigInt;Lscala/BigInt;)Lscala/BigInt;
12: areturn
At the bottom is the decompilation of the factorial function. Notice that it calls something called loop$1. That is the compiled version of the inner function loop. It is shown at the top of the decompiler output. The most important thing to notice here is the #34 -- the goto statement. This is a goto-loop. Notice that there is no recursive call back to the loop$1 inside the function, just as there is no call back to factorial inside the factorial function. The compiler removed the tail recursion and replaced it with a goto loop, just as Daniel said it would.
For more proof, you can run the code. On my MacBook, the original loop will blow up in a stack overflow when trying to calculate 60,000! I'm sure it blows up well before then, but you get the point. The optimized version of the code has no problems with 60,000! though it does take a while to calculate.
The one last mystery in all of this for me, goes back to my slides on Scala. In the slides, I used tail recursion as the reason why the factorial code ran faster in Scala than in Java. Clearly I was wrong. The code definitely ran faster on Scala, but I do not know why exactly. Perhaps javap can shine some light on this, too...
Wednesday, April 22, 2009
ActionScript Quickie: Check for Required Params
private function validArgs(...args):Boolean{
return args.every(function(arg:Object, ...crap):Boolean {
return arg ? true : false; });
}
Usage:
var firstName:String = ...
var lastName:String = ...
var iq:int = ...
var friends:Array = ...
validArgs(firstName, lastName, iq > 100, friends.length);
Monday, April 20, 2009
Properties Survey
public class User{
private String name;
public String getName(){
return this.name;
}
public void setName(String name){
this.name = name;
}
public static void main(String[] args){
User user = new User();
user.setName("Michael");
System.out.println(user.getName());
}
}
I suppose ugly is a subjective term. Perhaps verbose is the better term. Many other languages offer syntax that is simplified, but without losing encapsulation. For example, here is similar code in Ruby:
class User
attr_accessor :name
def name=(name)
puts "setter called"
@name = name
end
end
user = User.new
user.name = "Michael" #setter called
puts user.name #Michael
In the above code, the getter and setter are both generated by the attr_accessor method. The syntax makes it seem like the internal state is exposed directly, but there really are methods being generated and invoked. To demonstrate this, I chose to override the setter and added in some extra code to let you know it had been called. Ruby is a very powerful language, and its meta-programming (i.e. code that writes more code for you, like the attr_accessor method above) are renowned. However, it is hardly the only language that does properties like this. Good 'ol Python can do it too:
class User(object):
def getname(self):
return self.__name
def setname(self, name):
print "setter called"
self.__name = name
name = property(getname, setname)
user = User()
user.name = "Michael" #setter called
print user.name #Michael
Not quite as succinct or as elegant as Ruby, but it works. Slick properties were added to Objective-C not too long ago as well:
@interface User:NSObject{
NSString* name;
}
@property (copy) NSString* name;
@end
@implementation User
@synthesize name;
- (void) setName:(NSString *)name{
[self.name autorelease];
self.name = [name copy];
}
@end
The header file makes this more verbose, as well as the memory management. Objective-C has a garbage collector, but only for its desktop profile. You have to manually manage the memory if you are writing Objective-C for an iPhone application.
class Person{
private var _name:String;
public function get name(){
return _name;
}
public function set name(name:String){
trace("Setter called");
_name = name;
}
}
Note that you have to define both the get and set functions. This gives you the "dot syntax", i.e. you can do something like myPerson.name = "Michael". Scala is similar:
class Person{
private var _name:String = null
def name:String = _name
def name_=(name:String){
println("setter called")
_name = name
}
}
object App{
def main(args:Array[String]){
val person = new Person
person.name = "Michael"
println(person.name)
}
}
Scala can generate the getter/setter for you, but you cannot override them. So if all I wanted was the default behavior, nothing extra, I coudl just do
class Person{
var name:String=null
}
object App{
def main(args:Array[String]){
val person = new Person
person.name = "Michael"
println(person.name)
}
}
There is no way to override the methods that Scala generates for you. Even if you subclass Person, you cannot alter the generated getter/setter methods.
Thursday, April 09, 2009
The Java App Engine
Saturday, April 04, 2009
All Your Pages Are Belong To Us
Thursday, April 02, 2009
Statistically Useless
I was almost really impressed by Fitz & Brooks. It seemed like they were on the verge of saying "NBA statistics are meaningless" but they could not go that far. The NBA keeps a lot of statistics, but they really are mostly meaningless. Points, rebounds, assists, steals, blocks, etc. None of these are particularly good at determining if a player will make his team more or less likely to win a game. That is why you see guys like David Lee, Troy Murphy, Jose Calderon, and Mario Chalmers as statistical leaders. Look at the league leaders in blocks. Their teams are ranked 8th, 18th, 25th, 30th, and and 20th in the league in points per game. In terms of team field goal% allowed, the ranks are 3rd, 4th, 25th, 23rd, and 19th.
The NBA is hardly the exception. Most sports have a lot of meaningless (from the standpoint of predicting team success) statistics. Baseball has terrible stats like batting average, runs batted in, stolen bases, and fielding percentage. Football is not quite as bad mostly just because there are less stats. Still things like rushing touchdowns and even passing yards are pretty worthless.
Of course baseball has seen a statistical renaissance and it is known as sabremetrics. Michael Lewis's Moneyball is a popular work that discusses the value of sabremetrics. It uses the success of the Oakland A's and their GM Billy Beane as proof, but if it was written today, then the success of the Boston Red Sox would be (more?) compelling. Lewis expanded this "nerdy numbers leading to unexpected success" plot to basketball recently by writing an article about the Houston Rockets' Shane Battier. Basketball is tougher sport to measure though. Maybe there will be sabremetrics in basketball eventually, who knows.
It was definitely easier in baseball, where some of the statistical ingredients for more meaningful sabremetrics were already measured (like number of walks, total bases, etc.) What would similar stats in basketball be? Certainly making shots is good, missing shots is bad. How do you measure a three point play vs. a three point field goal? If a player takes a shot and misses, but somebody else gets an offensive rebound and an easy field goal, how is this measured?
Wednesday, March 25, 2009
Safari Flash Fail

Take a look at the stack trace that gets sent to Apple:

So I think this is being caused because I set the storage allowed by the Flash player to 0 KB. The default is 100 KB. Why do I have it set at 0 KB? It's not because I am some privacy fanatic, just the opposite in fact. It was because I was debugging some PayPal code a few months back that required that Flash local storage be unavailable. Most sites then ask to set it back to the default 100 KB. However, if the SWF in question is too small to show the settings dialog, then it can't show the dialog to ask the user to set the amount of local storage for the site. You can right-click any SWF to bring this setting up, well any SWF that is biggest enough to display the settings manager. This is on a per site basis, but you can set things globally as well. Dealing with the error that is thrown when setting data fails is what I helped PayPal with. Obviously there are a lot of sites that do not deal with this error, and that seems to bubble up in Safari and cause it to crash.
Friday, March 20, 2009
iPhone OS/SDK 3.0: Remote Notifications
Monday, March 16, 2009
Karma's Bitch: The Denver Broncos
Sunday, March 15, 2009
Hiring Developers
Monday, March 09, 2009
Covariance, Contravariants, Invariants, Help!
public static void invariance(List<Number> nums){
nums.add(1.1D);
for (Number n : nums){
System.out.println(n.doubleValue());
}
}
Invariance implies implies problems with the following code:
List<Integer> ints = Arrays.asList(1,2,3);
invariance(objs); // won't compile
List<Object> objs = new ArrayList<Object>(ints);
invariance(ints); // won't compile
You have go to match things exactly if you want to call an invariant method. Sometimes this is too restrictive, but not in this case. In the example above, you both read and write from the collection. If we could pass in the List of Integers, then adding a double to it would be invalid. If we could pass in a List of Objects, then calling the doubleValue method would be invalid. The only option here is invariance.
If you only need to read from the collection, then you can have covariance:
public static void covariance(List<? extends Number> nums){
for (Number n : nums){
System.out.println(n.doubleValue());
}
}
Now you can pass in anything that adheres to the API of Number, i.e. anything that is a subclass of Number.
List<Integer> ints = Arrays.asList(1,2,3);
covariance(ints); // compiles!
List<Object> objs = new ArrayList<Object>(ints);
covariance(objs); // won't compile
Not that you still cannot pass in the List of Objects. They don't have a doubleValue method, so it would be trouble if the compiler let them in. Now in the covariant method, if you tried to add a double to the list, the compile will not let you. You might protest and say "a Double is a subclass of Number, so I should be able to add it to a collection of objects that are a subclass of Number." This is shortsighted. You don't know exactly what kind of collection was passed in, just that everything in the collection should implement a particular interface. It could be a List of Integer or a List of Double or a List of Byte. So you cannot just add to the collection. For covariance, you want methods that do not mutate the state of the collection. Now technically, you could remove objects from the collection and the compiler will let you get away with it. Still it might helpful to think of covariance as read-only.
What if you want to add things to the collection? That is where contravariance comes in. Here is an example:
public static void contravariance(List<? super Number> nums){
nums.add(1.1D);
}
Here I say that want anything that is a superclass of Number. In other words, I am willing to say that I cannot be anymore specific about the elements in the collection than they are Numbers.
List<Object> objs = new ArrayList<Object>(ints);
contravariance(objs); // compiles!
List<Integer> ints = Arrays.asList(1,2,3);
contravariance(ints); // won't compile
Now I can send in a List of Objects, and since Object is a superclass of Number, the compiler is happy. If I send in a List of Integer, the compiler will stop me. I could try a List of Double too, and it will still stop me. My collection needs to very permissive, so that my contravariant method can add to it.
Neal Gafter's proposal to simplify some of this, looks like:
public static void covariance(List<out Number> nums){
for (Number n : nums){
System.out.println(n.doubleValue());
}
}
public static void contravariance(List<in Number> nums){
nums.add(1.1D);
}
I think this is intuitive. If I only want to take things out, I want covariance. If I want to put things in, I want contravariance. One of the things that made me re-think (and maybe confuse me) this subject was how things work in Scala. There you use + and - for covariance and contravariance, respectively. By default, a List in Scala is covariant. However, Lists are immutable. Adding to a Scala List produces a new List, it does not mutate the original List. The mutable cousin of List, like ArrayBuffer, is invariant, just like a List in Java.
Sunday, March 08, 2009
Day of Infamy
Last week I wrote about a day of infamy, but did not get around to talking about this. I was too happy about President Obama following through on his pledge to get our military forces out of Iraq. On that same day, the government did something that was not so good. The Fed essentially took control of Citibank.
But wait, we've heard Secretary Geithner and Chairman Bernake both say that they oppose nationalization of our banks, and assure us that what we are doing at Citibank is not nationalization. This is a classic case of mincing words. The Fed has not only assumed a commanding position of ownership, but they have already begun dictating the company's policies. They have remade the board of directors and mandated policies on employee compensation and lending standards. This is de facto control. It's like in The Sopranos when Uncle Junior was the nominal leader of the Jersey mafia, but everybody knew that Tony was calling the shots. This is what has happened with the US and Citibank. The US is Tony and Citibank is the Jersey mafia. The actual executives might as well be old, senile, and in jail.
Now the government has taken control of banks before, most notably in the S&L runs back in the 80's. This was always done to liquidate the assets of the bank. Maybe that is what is going on with Citibank, and the government is misleading the public because ... well who knows.
I do not think so. The government wants banks to loan money. They want businesses to borrow and hire workers. They want consumers to borrow and spend. This is not happening, for many very good reasons. If you are a bank, you need to clean up your balance sheet by increasing your cash and reducing your risky investments. If you are a consumer, you need to increase your savings and reduce your debt. But if the government has control over large banks, they can make the bank forget about increasing cash and reducing risk. They can offer up loan to any and everybody. They can make it really hard on consumers to resist the temptation of free money. Imagine getting a letter in the mail everyday saying that you have been pre-approved for a new credit card with a 0% interest rate and a $100,000 limit! It's the modern day version of the chicken in every pot.
And so it is that I think February 27 will be a day of infamy in American history. It may be the beginning of an era where the government is an essential part of all economic activity in America. Want to buy a house? Ask the government. Want to buy a car? Ask the government. Want to send your kids to college? Oh wait, nevermind on that one. Want to plan for retirement? Umm ...
Friday, February 27, 2009
A Great Day ... and A Day of Infamy
Today is a great day for America, and a day of infamy. Do you want the good news or the bad news first?
The good news is that today is the beginning of the end of the Iraq War. This war has been going on for so long, and has mostly faded into the back of people's minds. This is especially true these days as people are more concerned with the economy than anything else. But let's not forget how this war started.
The Iraq War became a certainty on 9/11/2001. On that day, just hours after terrorists from Saudi Arabia killed 2,974 Americans, the US government began preparing to attack Iraq. There was only one problem. Iraq was not involved in 9/11. The US is an ally of Saudi Arabia, where the terrorists has come from, so nothing could be done there. All that could be done was to go after the leader of the al-Qaeda, Osama bin Laden, and he was in Afghanistan. Luckily plans for attacking Afghanistan had been drawn up before 9/11. Those plans did not involve any US troops, just air support and special forces. It only took a few months to topple the Afghani government, and install drug traffickers to run Afghanistan. Given such a plan, it was not surprising that bin Laden was not found in Afghanistan.
With that little detail out of the way, it was time to move on to Iraq. But how to justify the war? We all got to learn a new term: Weapons of Mass Destructions or, in militaryspeak: WMDs. The US government produced all kinds of propaganda about Iraq and WMDs. Everything from yellowcake uranium to aluminum tubes were used to convince Americans that Iraq had The Bomb and was going to give it to terrorists who would use it on America.
And Americans believed it. Why? Are we all that stupid? Was the propaganda that good? Well maybe, but the real reason is that we wanted to believe it. We had malice in our hearts and wanted vengeance. The "victory" in Afghanistan had not satisfied this bloodlust. Maybe if we had caught bin Laden and let the NYPD beat him to death with nightsticks on primetime television (tape delayed for the west coast of course) then we would have been satiated and a little more likely to call BS when the "facts" about Iraq were presented. Who knows.
So the war began... It was definitely a more satisfying war for television views. We got Shock and Awe. We got to see huge numbers of troops marching through the desert. We got to see the statue of Saddam Hussein toppled and the American flag raised. We got to see our leader declare victory on an aircraft carrier. Good stuff. Great television.
Meanwhile there were massive casualties, but they were not the kind we cared about. They were not American casualties. They were not even Iraqi military casualties. No, the Iraqi military was virtually non-existent after years of economic sanctions against Iraq. The casualties were Iraqi civilians. Most agree that were tens of thousands, with some estimating hundreds of thousands of Iraqi citizens killed as a result of the war. It does not matter what the exact number is. It could have been millions of Iraqis killed, and it was still not newsworthy in the US.
However, not too long after victory was declared, the US started suffering casualties. There was a civil war going on, created by the vacuum of power left behind when the US overthrew the Iraqi government. US troops were prime targets, as Iraqi militants knew that the best way to get the US to leave their country was to kill US troops.Soldiers being killed by road side bombs is newsworthy in the US as it turns out.
The situation only got worse in Iraq, until finally in February of 2007, the US increased troop levels to support policing of Iraqi streets. Many folks, including John McCain, had been saying this was needed and should have been done before victory had been declared. After more than six months of this, the violence had only increased. In August of 2007, religious bloodshed caused the leader of one of the chief combatants, Mugtada al-Sadr, to call for a cease fire. In September of 2007, the US government claimed that violence was down by 50% and took full credit for this.
In 2008, as part of his campaign for President,Barack Obama promised to withdraw from Iraq. This was viewed as a weakness in his campaign. During debates analysts would claim that Obama was a little weak on foreign policy, but strong on the economy. The economy won out, and Obama is President. Today he announced a plan to withdraw the bulk of US forces by 2010. This is pragmatically about as fast as is possible. It's not easy to move 100,000+ troops and all of their supporting infrastructure and equipment.
This post ran long, so you'll have to wait for the bad news...
Thursday, February 26, 2009
Lift 1.0
The Lift web framework hit its 1.0 release today. Major congratulations goes to the Lift team. I have been lucky enough to get to know Lift founder David Pollak and Lift committer Jorge Ortiz. They are both very bright engineers and the high quality of Lift reflects this. I was very happy to give a quote about Lift for the 1.0 launch. In my quote I talked about how innovative Lift is. I would like to expand on this.
First, let me say that I realize that Lift has borrowed some ideas from other frameworks. Heck, David gives credit where its due and on the front page of Lift lists Seaside, Ruby on Rails, Django, and Wicket as sources of inspiration. Given that here are the things that I find innovative about Lift.
View-first design. Everybody loves MVC, but it is no panacea. I have seen a lot of apps that used Struts or Spring MVC that wound up putting a lot of code in the actions or controllers. If it wasn't there, then it often leaked out into the view code. Even if the view code did not contain enormous scriptlets, there would be tons of control flow tags (if, switch, for-loops, etc.) The view-first approach is to go to the view first, and pull in logic bits (snippets) as needed. If a snippet needs some data, it gets it. If it creates a form, then it handles the submission... sometimes with Ajax. Snippets use Scala's XML support for creating view code (XHTML.) Some people think this mixes view code into "back-end" code, but that is bit of a robotic response. What is really great is that it is all statically typed. You can't reference a variable that doesn't exist or a non-existent method/property (like you would in JSP, etc.) You would get a compile error. Plus your XHTML has to be well-formed -- or it won't compile. As tooling improves around Scala, you will be able to do static analysis and refactoring to snippet code.
Record/Mapper (ORM). Lift's ORM may not be quite as sophisticated as JPA, but its syntax is beautiful. Now some of this is thanks to Scala, but Lift really makes good use of Scala's type system. By using parameterized traits, you get class that look very declarative, i.e. like they are just defining metadata about the mapping between class and database table, but there is actually a lot of concrete code being reused. It is as elegant as the metaprogramming that you see in Rails or Grails (and thus more elegant than JPA) but its more concrete nature not only make it less mysterious, but makes it easier to debug. I would guess that it also helps with performance. Of course it would be hard to measure this, since Scala outperforms Ruby and Groovy.
Comet. One of the things that Rails has had going for it is that it seemed design for Ajax from the get-go. This is definitely true of Lift, too. I would say that Lift is even more designed for Ajax, but that is a little subjective. What is more objective is that Lift is designed for Comet. Once again it leverages Scala brilliantly, by using Scala's Actors.
That's just a few things, but hopefully you see the trend here. Not only does Lift improve on existing ideas, but in some cases it really breaks new ground.
Wednesday, February 25, 2009
Spring is Here
Well not quite, but spring training is here. My local teams, the Giants and A's, both play their first spring training game today. Tim Lincecum is even pitching today. Very exciting!
Of course it's been an ugly time lately for Major League Baseball. Everyone is upset about Alex Rodriguez and his use of steroids in the past. I have mixed feelings on this. First, I don't care that A-Rod used steroids. I don't care who used them. I don't find any difference between a guy using steroids or using "legal" supplements or sophisticated weight training or getting a cadaver's ligament sewn to their knee when they tear their ACL. These are all examples of technology allowing athletes to be better athletes than has ever been possible before. They are bigger, stronger, faster, and they are able to play at a high level for a longer amount of time. Does this mean that as technology progresses, that the sports may change so much that it turns off fans? Yeah, maybe. Perhaps it has already started.
All of that being said, I hate dishonesty even when it is required. I don't blame A-Rod for lying about using steroids until evidence was surfaced that contradicted him. I just can't stand to hear the PR crap that he recites to Peter Gammons or at press conferences. How bad for A-Rod would it be if instead of saying "I was young and stupid" he said "I wanted to hit as many home runs as possible because the more I hit, the more money I made, so I took steroids." I know he's got to say that he's not using now and that he has to minimize the amount of time that he did use them, so I can live with these lies. But wouldn't we all be better off if people just said "I took it so that I could make more money, be more famous, see myself on SportsCenter more often."
Thursday, February 19, 2009
What’s the Scala way for this type of read loop?
What’s the Scala way for this type of read loop?

import java.io._
import scala.io._
import scala.collection.mutable._
object PropsParser{
val state = HashMap.empty[String,String]
// Run both ways, should print the same thing
def main(args:Array[String]){
load
println(state)
state.clear
load2
println(state)
}
// A more Scala-ish way? Could be memory hog
def load2{
val src = Source.fromFile(getFile)
src.getLines.map(_.split("=")).foreach((kv) => state += kv(0) -> kv(1).trim)
}
// Dave's original way
def load{
val in = new BufferedReader(new FileReader(getFile))
var line = in.readLine
while (line != null){
val kv = line.split("=")
state += kv(0) -> kv(1)
line = in.readLine
}
in.close
}
def getFile= new File("Some/path/to/your/props/file")
}
Tuesday, February 17, 2009
IntelliJ Fail

Imagine if Microsoft or Adobe had software like this! Imagine the amount of ridicule. When you pay for software, you expect something a little more ... polished? Professional? Debugged? I don't know, but this is not it. Anyways, I've heard good things about 8.1 and the updated Scala plugin for it, so I hope this is just an aberration.
Tuesday, February 10, 2009
Named Parameters and Builders in Scala
So I did some playing around with simulating named parameters in Scala. Let's say we have a class like this
class Beast (val x:Double, val y:Double, val z:Double){
// other stuff in here
}
Now suppose that x and y are required, but z can have a default value of 0. My attempt at simulating named parameters involved creating some classes corresponding to the variables.
class X(val x:Double)
class Y(val y:Double)
class Z(val z:Double)
object X{
def ->(d:Double)= new X(d)
}
object Y{
def ->(d:Double)= new Y(d)
}
object Z{
def ->(d:Double) = new Z(d)
}
Do you see where this is going? Next we need a companion object for Beast:
object Beast{
def apply(xyz:Tuple3[X,Y,Z]) = new Beast(xyz._1.x, xyz._2.y, xyz._3.z)
}
Now we can do something like this:
val c = Beast(X->3, Y->4, Z->5)
So X->3 calls the -> method on the X object. This returns a new instance of the X class with value 3. The same thing happens for Y->4 and Z->5. Putting all thee inside the parentheses gives us a Tuple3. This is passed in to the apply method on the Beast object which in turn creates a new instance of Beast with the given values. So far so good?
Now we just need a way to make z optional and give it a default value if it is not supplied. To do this, we need some Evil.
object Evil{
implicit def missingZ(xy:Tuple2[X,Y]):Tuple3[X,Y,Z]=(xy._1,xy._2, new Z(0))
}
Now it is possible to get the optional value behavior:
The implicit def missingZ is used to "invisibly" convert a Tuple2[X,Y] into a Tuple3[X,Y,Z].
object BeastMaster{
import Evil._
def main(args:Array[String]){
val b = Beast(X->1, Y->2)
println(b)
val c = Beast(X->3, Y->4, Z->5)
println(c)
}
}
Unfortunately this is where the coolness ends. You can't switch around the order of the variables, i.e. Beast(Y->2, X->1) or even Beast(Z->5, X->3, Y->4). You can't just add more implicit defs either. Like if you try:
object Evil{
implicit def missingZ(xy:Tuple2[X,Y]):Tuple3[X,Y,Z]=(xy._1,xy._2, new Z(0))
implicit def missingZ2(yx:Tuple2[Y,X]):Tuple3[X,Y,Z] = (yx._2, yx._1, new Z(0))
}
This will cause Beast(X->1,Y->2) to fail to compile. You will get the following error:
error: wrong number of arguments for method apply: ((builder.X, builder.Y, builder.Z))builder.Beast in object Beast
val b = Beast(X->3, Y->5)
This is not the most obvious error. The problem (I think) is that the compiler can't determine which implicit def to use. The culprit is type erasure. There is no way to tell the difference between a Tuple2[X,Y] and Tuple2[Y,X] at runtime. At compile there is, so you would think that it would be possible to figure out which implicit to use... Or perhaps it is possible to merge the two implicit together by using an implicit manifest?
Monday, February 09, 2009
Give me the Cache
I had an application that performs a lot of reads, in the neighborhood of 1000 per second. The application's data needed to be modified very infrequently, less than 100 times per day. Ah, perfect caching! Indeed. In fact the cache could even be very simply: a list of value objects. Not only that, but it's a very small list. It gets even better, the application has a pretty high tolerance for cache staleness. So it's ok if it takes a few minutes for an update to show up. A few different caching strategies emerged out of this too-good-to-be-true scenario. So it was time to write some code and do some testing on these strategies. First, here is an interface for the cache.
public interface EntityCache {
List<Entity> getData();
void setData(Collection<? extends Entity> data);
}
Again, the Entity class is just a value object (Java bean) with a bunch of fields. So first, let's look at the most naive approach to this cache, a class I called BuggyCache.
So why did I call this buggy? It's not thread safe. Just imagine what happens if Thread A calls getData(), and then is in the middle of iterating over the data, and Thread B calls setData. In this application, I could guarantee that the readers and writers would be from different threads, so it was just a matter of time before the above scenario would happen. Hello race condition. Ok, so here was a much nicer thread safe approach.
public class BuggyCache implements EntityCache{
private List<Entity> data;
public List<Entity> getData() {
return data;
}
public void setData(Collection<? extends Entity> data) {
this.data = new ArrayList<Entity>(data);
}
}
public class SafeCache implements EntityCache{
private List<Entity> data = new ArrayList<Entity>();
private ReadWriteLock lock = new ReentrantReadWriteLock();
public List<Entity> getData() {
lock.readLock().lock();
List<Entity> copy = new ArrayList<Entity>(data);
lock.readLock().unlock();
return copy;
}
public void setData(Collection<? extends Entity> data) {
lock.writeLock().lock();
this.data = new ArrayList<Entity>(data);
lock.writeLock().unlock();
}
}
This makes use of Java 5's ReadWriteLock. In fact, it's a perfect use case for it. Multiple readers can get the read lock, with no contention. However, once a writer gets the write lock, everybody is blocked until the writer is done. Next I wrote a little class to compare the performance of the Safe and Buggy caches.
This class creates a writer thread. This thread updates the cache every 30 seconds. It then creates 1000 reader threads. These read the cache, iterate and print its data and pause for half a second. This goes on for some period of time (3 minutes above), and the number of reads and writes is recorded.
public class Racer {
static boolean GO = true;
static int NUM_THREADS = 1000;
static int DATA_SET_SIZE = 20;
static long THREE_MINUTES = 3*60*1000L;
static long THIRTY_SECONDS = 30*1000L;
static long TEN_SECONDS = 10*1000L;
static long HALF_SECOND = 500L;
public static void main(String[] args) throws InterruptedException {
final AtomicInteger updateCount = new AtomicInteger(0);
final AtomicInteger readCount = new AtomicInteger(0);
final EntityCache cache = new SafeCache();
long startTime = System.currentTimeMillis();
long stopTime = startTime + THREE_MINUTES;
Thread updater = new Thread(new Runnable(){
public void run() {
while (GO){
int batchNum = updateCount.getAndIncrement();
List<Entity> data = new ArrayList<Entity>(DATA_SET_SIZE);
for (int i=0;i<DATA_SET_SIZE;i++){
Entity e = Entity.random();
e.setId(batchNum);
data.add(e);
}
cache.setData(data);
try {
Thread.sleep(THIRTY_SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
updater.start();
Thread.sleep(TEN_SECONDS);
List<Thread> readers = new ArrayList<Thread>(NUM_THREADS);
for (int i=0;i< NUM_THREADS;i++){
Thread reader = new Thread(new Runnable(){
public void run() {
while (GO){
int readNum = readCount.getAndIncrement();
List<Entity> data = cache.getData();
assert(data.size() == DATA_SET_SIZE);
for (Entity e : data){
System.out.println(Thread.currentThread().getName() +
"Read #" + readNum + " data=" + e.toString());
}
try {
Thread.sleep(HALF_SECOND);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
readers.add(reader);
reader.start();
}
while (System.currentTimeMillis() < stopTime){
Thread.sleep(TEN_SECONDS);
}
GO = false;
for (Thread t : readers){
t.join();
}
long duration = System.currentTimeMillis() - startTime;
updater.join();
int updates = updateCount.get();
int reads = readCount.get();
System.out.println("Duration=" + duration);
System.out.println("Number of updates=" + updates);
System.out.println("Number of reads=" + reads);
System.exit(0);
}
}
Testing the BuggyCache vs. the SafeCache, I saw a 3-7% drop in throughput from using the SafeCache. Actually this was somewhat proportional to the size of the data (the DATA_SET_SIZE variable.) If you made it bigger, you saw a bigger hit as the reading/writing took longer and there was more contention.
So this seemed pretty acceptable to me. In this situation, even a 7% performance hit for the sake of correctness was worth it. However, another approach to this problem came to mind. I like to call it a watermark pattern, but I called the cache LeakyCache. Take a look to see why.
The idea here is to keep one ever growing list for the cache and index (or watermark) to know where the current cache starts. Every time you "replace" the cache, you simply add to the end of the list and adjust the watermark. When you read from the cache, you simply copy from the watermark on. I used an AtomicInteger for the index. I probably did not need to, and a primitive int would have been good enough. I used a CopyOnWriteArray for the cache's list. You definitely need this. Without it you will wind up with ConcurrentModificationExceptions when you start mutating the cache with one thread, while another thread is iterating over it.
public class LeakyCache implements EntityCache{
private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
private final AtomicInteger index = new AtomicInteger(0);
public List<Entity> getData() {
List<Entity> current = data.subList(index.get(), data.size());
return current;
}
public void setData(Collection<? extends Entity> newData) {
int oldSize = this.data.size();
this.data.addAll(newData);
if (oldSize > 0){
index.set(oldSize + newData.size());
}
}
}
So you probably see why this is called LeakyCache. That internal list will grow forever. Well at least until it eats all of your heap. So that's bad. It also seems a bit more complicated than the other caches. However, it is thread safe and its performance is fantastic. How good? Even better than the BuggyCache, actually 3x as good as the BuggyCache. That deserves some qualification. Its througput was consistently more than 3x the throughput of the other caches, but I didn't run any long tests on it. It would eventually suffer from more frequent garbage collection as it leaks memory. However, if your updating is not too frequent, the entities are relatively small, and you've got lots of memory, then maybe you don't care and can just recycle your app once a month or so?
Maybe you aren't satisfied with that last statement. You're going to force me to fix that memory leak, I knew it. Here is a modified version that does just that.
So what does this do? In the constructor we spawn Yet Another Thread. This one periodically (once an hour in the example) checks to see how big the cache is. If it is over some limit, it gets the current data, clears the cache, adds the current data back to it, and reset the watermark to 0. It also Stop The World to do this by once again using a ReentrantReadWriteLock. Notice how I have abused the lock by using the read lock for both getting and setting the cache. Why use it for setting? The cleaner thread gets exclusive access to the write lock. It uses it when it is cleaning up. By having the setData method use the read lock, it will be blocked if the cleaner thread is in the middle of a cleanup.
public class LeakyCache implements EntityCache{
private final List<Entity> data = new CopyOnWriteArrayList<Entity>();
private final AtomicInteger index = new AtomicInteger(0);
private final ReadWriteLock lock = new ReentrantReadWriteLock();
public LeakyCache(){
Thread cleaner = new Thread(new Runnable(){
public void run() {
while(true){
try {
Thread.sleep(60*60*1000L);
} catch (InterruptedException e) {
e.printStackTrace();
}
lock.writeLock().lock();
if (data.size() > 500000){
List<Entity> current = new ArrayList<Entity>(getData());
index.set(0);
data.clear();
data.addAll(current);
}
lock.writeLock().unlock();
}
}
});
cleaner.start();
}
public List<Entity> getData() {
lock.readLock().lock();
List<Entity> current = data.subList(index.get(), data.size());
lock.readLock().unlock();
return current;
}
public void setData(Collection<? extends Entity> newData) {
lock.readLock().lock();
int oldSize = this.data.size();
this.data.addAll(newData);
if (oldSize > 0){
index.set(oldSize + newData.size());
}
lock.readLock().unlock();
}
}
Adding this extra bit of complexity fixes the memory leak, while maintaining thread safety. What about performance? Well the performance is highly configurable depending on how often the cleaner thread runs (well how long it sleeps really) and how big you are willing to let the cache grow before cleaning it up. I put it on same very aggressive settings, and it caused about a 15% hit to the leaky version. The performance is still much better than any of the other versions of the cache.
Next up ... write the cache in Scala using Actors.