Tuesday, September 30, 2008

WSDL in ActionScript

One of the advertised features of Adobe's Flex Builder is that it works with web services. Indeed, in any project you can import a WSDL and Flex Builder will generate lots of code for you. The resulting generated code states that the code generator is based on Apache Axis2, and it looks like it. This is mostly a good thing.

This is ok for a single developer or even a small team. Once you get to larger scale development, you usually want to keep code generated artifacts separate for the source code that generated them. Often you never want to check-in generated code. Why? Because then you have two sources of truth: the artifact (WSDL in this case) and the generated code. You don't want to have to keep these things in sync manually, you want your build process to do it. So you don't check in the generated code, and your build system generates it instead.

So ideally the code generation in Flex Builder could be invoked outside of Flex Builder. This may be the case, but so far I have no luck in this. It is certainly not a documented part of the SDK.

I looked for an alternative and found wsdl2as. This looked promising, but did not work out. First, it expects you to send in XML literals when sending SOAP messages. Sure it generates the boilerplate around the core message, but if I wanted to work directly in XML, I would not have bothered with a code generator. It has an option that seems designed to deal with this, but did not. Even worse, it does not handle anything except the simplest WSDLs. The first WSDL I tried with it defined complex types for the input parameter and return type of the web service. This caused wsdl2as to choke, as it expected any type information to be inlined. Sigh.

Monday, September 29, 2008

At Season's End

The regular season of Major League Baseball is at an end. That is always a bummer to me. One of the reasons that I like baseball so much is that it is played every day. Every day something interesting happens. Of course the playoffs are here, but there is not much joy in those for me this year. No Braves. No A's. No Giants. At least there are no Yankess or Mets, though...

It is always fun to look back at the season, and of course, to speculate on the future. Who should win the awards? And, who should win in the postseason? Being a numbers man, the awards are the most fun to examine.

AL MVP
This is a close race because there are no outstanding candidates. In fact, top AL hitters were significantly weaker than NL hitters this year. If Lance Berkman or Chipper Jones was in the AL, you could make a very strong case for them as MVP... Let's look at a couple of relevant stats. First, runs created:

1.) Grady Sizemore, 128
2.) Josh Hamilton, 122.8
3.) Dustin Pedroia, 120.2
4.) Nick Markakis, 118.4
5.) Aubrey Huff, 116.5

That is a nice advantage for Grady Sizemore. One reason for the advantage over the other players is that he played a lot and lead off, leading to a lot of plate appearances. Still he had a very good season. Who would guess that a lead-off hitter would have 33 home runs and 98 walks? Perhaps he should not be hitting lead-off... A more weighted number is runs created per 27 outs. Here is that top five.

1.) Milton Bradley, 8.97
2.) Alex Rodriguez, 7.89
3.) Kevin Youkilis, 7.8
4.) Carlos Quentin, 7.67
5.) Nick Markakis, 7.42

Only one hold-over from the previous top five, and that is the very underrated Markakis. Perhaps he is the MVP? Perhaps. The other leaders in total runs created are all in the top eleven in runs created per 27 outs. For a final measure, let's look at the top 5 in VORP.

1.) Alex Rodriguez, 65.6
2.) Grady Sizemore, 62.7
3.) Dustin Pedroia, 62.3
4.) Aubrey Huff, 58.4
5.) Josh Hamilton, 57.1

Another very different top five! Even missing some games, A-Rod provided the most "value" for his team. Don't tell Yankee fans this, as I am sure they are working on a way to blame their postseason absence on A-Rod. I can just imagine "Ah, Moose got us 20 wins, if only A-Rod could have hit some!"

From a pure statistical consideration, Milton Bradley was the most "potent" hitter, but only played 126 games. Throw him out, and it sure looks like you would have to go with A-Rod as MVP, once again. If I had a vote, that is who I would go with.

That is not going to happen, and everybody knows it. People like to vote for players who are on "winners". You have to be clearly the best (and even that is not good enough often) to get a MVP trophy and be on a team that is not playing in October. So the people they list are folks like Boston's Pedroia and Youkilis, as well as Justin Morneau and Joe Mauer from the Twins. If Carlos Quentin had not broken his hand during a temper tantrum, he would surely be a front runner. The other name I've heard is Francisco Rodriguez, from the Angels.

Given that, it would seem that Pedroia has the advantage over the other "candidates."

NL MVP
This one is a little easier. Albert Pujols lead the league in all of the stats mentioned previously. He was clearly the best hitter in the league, and nobody is really arguing this one. Ryan Howard's .251 average pretty much guaranteed that he is not in the mix. He is the only guy with "traditional" stats (HRs/RBIs) that beat Pujols, and he plays for a division winner. He also finished very strong, just as his team did, coming from behind to pass the Mets in the last month. But there's no chance of this argument working! Let us hope not at least...

AL Cy Young
This is viewed as a two horse race between Cliff Lee and Roy Halladay. That is good, but that is how it should be. They were far and away the two best pitchers in the AL. Nobody was even remotely close. Most people think that Lee will win because, well because he is a winner. His 22 wins jumps out. He also led the league in ERA. It is rare for a pitcher to lead in both of those stats and not win the Cy Young. For what it's worth, he led the league in VORP as well, edging out Halladay. You can make nice arguments about how pitched against weaker compettition, but it's hard to imagine too many people buying that. Cliff Lee should win and will win.

NL Cy Young
Now this is more interesting. Once again a lot of people think it should be a two-horse race. Once again they are right, but they've got the wrong horses. Most people think it is between Brandon Webb and Tim Lincecum. This may indeed be the two "finalists" for the award, but it should not be that way. Webb was nowhere near as good as Lincecum. He just has a lot more wins, and people get carried away over wins. So Lincecum should be Cy Young, right?
I won't argue against it, especially since I root for the Giants against most teams. However, there is a guy who has been just as good, and maybe even a little better than Lincecum: Johan Santana. He edged Lincecum in ERA, and in VORP (73.4 to 72.5.) Statistically, over the course of the season, he was worth about one extra run (total) more than Lincecum. By comparison, Cliff Lee edged Halladay by about 3.5 runs in VORP.
If you start making the "they played for a winner" argument, then clearly Santana has the edge over Lincecum. You can take that one step further. The Mets were battling the Phillies for the NL East crown this weekend. On Saturday they sent Santana out on short rest and he delivered better than you could hope for by throwing a complete game shutout while striking out nine. I think "clutch" is an illusion, but most people belive in it and I am sure they would say that Santana was as clutch as it comes. He definitely did everything he could to get his team in to the playoffs.
So if people were talking about Lincecum vs. Santana, I would guess they would pick Santana. But they are not. They are only mentioning Lincecum vs. Webb. Lincecum is the clear choice there. Personally if I had a vote ... I would vote for Santana. He has been a little better. The NL East is much better (in terms of hitters) than the NL West.

Thursday, September 25, 2008

The Great Bailout

"OMG! The _____ is in trouble! What are we going to do!!!?!"

When government people say things like this, it is always a precursor to the government proposing itself as the solution to the problem. The problem is so dire, that only the government can solve it. Of course they will need more money and more power to solve the problem. Oh, and if you don't think this is all true, then you are too dumb to understand the problem or you are just un-American because you don't care about all of the Americans who could be hurt by this grave danger.

Mr. Dave Winer makes the point that the current administration has used this argument before. Only then it was Colin Powell making the case for war in Iraq. Now it is Henry Paulson doing the same thing but with regards to the banking meltdown. Dave is right on all of this. He then goes out of his mind by suggesting that Bush/Cheney should resign, Nancy Pelosi be made President, and Paulson's plan to move right ahead. The problem is not just Bush/Cheney, and Pelosi is definitely not the solution. The problem is Paulson's request for power and money. It's like saying it would have been ok to listen to Colin Powell and attack Iraq, but only if Al Gore would have been president. It didn't matter who was President, attacking Iraq was wrong in every possible way. 

Of course Ron Paul has some interesting things to say about the bailout. His opinions are largely grounded in the Austrian economic theory that the government makes business cycles more extreme (bigger booms and bigger busts) by causing malinvestments, like buying subprime mortgages for example. Like all things in Austrian economics, it is a matter of "belief" as these are statements that are purposely impossible to scientifically verify. However, it is hard to dispute that the U.S. government has encouraged high risk loands for the purpose of buying real estate, and that the very financial institutions who did this most are now the ones that are going bankrupt.

The point is that our government does not have a good track record here. Maybe it has been the main source of the problem, as Paul suggests, or maybe not, but it certainly has been part of the problem. Now it wants unprecedented (in this country at least) power and money to solve the problem that it has been at least complicit in. Given that, how can we support this idea?

Oh, but what is the alternative? I don't know, and I don't think the government knows either. Yes, there will be banks that go under. Does that mean that we'll all be out of money? No, of course not. Anyone's savings are already guaranteed by FDIC. Not to mention that even in the case of bankruptcy, creditors (that would be people that bank borrowed money from, i.e. depositors) have first priority. Nobody is going to lose their savings. 

But surely there will be other disasters, right? If so many go out of business, how will we get loans for houses, cars, or new businesses? Well perhaps not all of the banks will go out of business. Certainly there are those that have been buying up these insolvent banks. Or maybe other companies will take the opportunity to expand into the banking vacuum created by the insolvent banks. I'm not sure, but I'm not willing to let FUD from the government convince me to give the government the kind of virtually unlimited power that they are asking fo.

Tuesday, September 23, 2008

No SharedObjects Allowed

Client side storage by the Flash player (SharedObjects) has several advantages over traditional client side storage, a.k.a. HTTP cookies. From a security standpoint, it is better because the data is never sent over the wire. However the main advantage to most people is that it is bigger, and when it comes to managing data on the client, size definitely matters.

By default you get 100 KB instead of the 4 KB you get with cookies. If your application tries to store 101KB, it won't fail. Instead the user will be prompted to increase the allocated space by a factor of 10, i.e. from 100 KB to 1 MB. Of course you probably don't want the user to ever see this screen. One of the other advantages of SharedObjects is that people don't delete them. People blow away their cookies all too often, but most people would have no idea how to do the same with SharedObjects. The only you would find out would be if you saw the Flash player settings screen, i.e. the interface that appears when a Flash application tries to go over the 100 KB default limit.

So stick to under 100 KB and all is good, right? Not so fast. The settings interface requires that your Flash app is at last 136x213. If it is smaller than that, then what happens? First let's explain what happens when it is big enough to show the settings interface. When you flush data to local storage, a string is returned with a status. Here is typical code for this.


var testSo:SharedObject = SharedObject.getLocal("test", "/", false);
testSo.data.testValue = "test";
var soStatus:String = testSo.flush();
if (soStatus != null){
switch (soStatus){
case SharedObjectFlushStatus.PENDING:
testSo.addEventListener(NetStatusEvent.NET_STATUS, someHandler);
break;
case SharedObjectFlushStatus.FLUSHED:
break;
}
}

There are two possible return values, either "pending" or "flushed." There is no fail. So if you were flushing 101 KB, then you would get a pending return value. Now all you can do is what for an event, or more precisely a NetStatusEvent. This will tell you if the user allowed you to increase the size or not. If not then the NetStatusEvent will come back with a failure code.

If there is not enough space to display the settings interface, then you would think that you would just get an automatic failure, but you don't. Instead you get a "pending" from the return of flush. It's not really pending, since the user can't actually choose to allow it to succeed. It can only fail. But the player pretends this is not the case and that the user denied you request. So you need to still listen for the NetStatusEvent. If you don't catch that event, then it will cause the Flash player to throw an error to the user, and of course you do not want that. Here is a picture of that.


Monday, September 15, 2008

Death Magnet



Last week, Metallica released Death Magnetic. Your opinion of it seems to have been determined approximately 17 years ago. That is when Metallica released their self titled or so called "Black Album." For some people, this was Metallica's sell-out album. They went from being a cult favorite to being mainstream. Nevermind that they had already multiple gold and platinum records prior to the Black Album, but no one can argue with the success of the Black Album. It has always been hip to criticize that album and everything after it, and to praise everything before it. If you are hip like that, then obviously you won't like Death Magnetic. On the other hand, if you thought the Black Album was a big improvement for Metallica, then you will love Death Magnetic.
Personally, I like the Black Album and I like Death Magnetic. It is definitely in the vein of other recently successful rockers of the 80s/90s, like U2, R.E.M., and the Red Hot Chili Peppers, in that it "channels" a lot of their classic material while still sounding modern. The guitar playing is impressive, and in many ways the whole thing felt like it had been inspired by the Guitar Hero video game (which I also love to play.) In fact Death Magnetic can be downloaded and played on the XBox 360 and Playstation 3, but unfortunately for me, not the Wii...

Monday, September 08, 2008

Scala ArrayStack

I had not done any Project Euler problems for awhile, so I decided to solve one yesterday. I was also planning on attending the next BASE meeting, so I wanted to brush up my Scala. Thus it was time to solve Problem #47 in Scala.

The solution got me a little more familiar with some of the data structures available in scala.collection.mutable. In particular I needed a structure to hold a list of factors. I decided that ArrayStack was the best choice. Here is my solution:


package probs
import scala.collection.mutable.ArrayStack

object Euler47 {
def main(args : Array[String]) : Unit = {
val start = System.nanoTime
solve(4)
val duration = System.nanoTime - start
println("duration=" + duration/1000000.0)
}

def solve(n:Int):Unit = {
var i = 2
while (i > 0){
var j = i
while (j<i+n && numFactors(j) ==n){
j += 1
}
if (j-i == n){
val msg = (i until j).foldLeft(""){(x,y) => x + y + " "}
println(msg)
return
}
i += 1
}
}

def numFactors(n:Int):Int = {
var factors = new ArrayStack[Int]
var i=2
var m = n
while (i <= m/i){
while (m % i ==0){
if (factors.size ==0 || i != factors.peek){
factors += i
}
m /= i
}
i += 1
}
if (m != 1){
factors += m
}
factors.size
}
}

I was very pleased with the performance, solving the problem in about 0.4 seconds on my MacBook. I saw a similar, but not as good Java solution on the message boards that ran in 1.5 seconds. That solution added all of the factors repeated times and then had to loop through them again to get rid of duplicates. I ran it on my MacBook and it ran in 1.1 seconds. Even when I "fixed" it, it still took about one second. I am sure I could have done a lot of work to it and got it as fast as Scala, but why bother.

Thursday, September 04, 2008

JavaScript Faster than Flash

This is the last benchmark for awhile. Well, at least for today. I converted the JS benchmarks to ActionScript and tested them. The result were surprising, as JavaScript in Safari 4 and Firefox 3.x edged out Flash:


A few notes. I could not convert all of the tests, as two of them (the DOM and Ajax tests) were predicated on browser specific code. I could have done 'equivalent' functionality in ActionScript, but it did not appropriate for comparison. Otherwise the code was translated as is ... for the most part. I did add static type information where possible. There were also a few APIs (on Date and Array) that had be tweaked slightly. I tested similar changes to the JavaScript. The only test where there was any effect was the Date test. The JavaScript used Date.parse, which does not exist in ActionScript. The Date constructor does the same thing. If I switched to using the Date constructor in JavaScript, it was just slightly slower.

It certainly seems that much of the performance advantage enjoyed by Flash upon arrival of Flash Player 9 has been erased. Flash had a strong advantage still in more mathematical calculations (dates, integer and floating point arithmetic) as well as string manipulation. It did very poorly with arrays and regular expressions. I would guess that as the JITs for JavaScript get better, the string advantages will disappear. Flash will probably maintain an advantage in more mathematical computations, especially given its vector graphics features. Hopefully advances in JavaScript will spurn Flash VM progress.

Notes
1.) Tested on both Flash 9 and 10 RC2 on both OSX and Windows. Negligible performance differences in any of the permutations.
2.) Also tested with Silverlight, but only on Windows. It was slower than everything except IE7. However, that was because it was terribly slow at regular expressions and error handling. It clearly had the best JIT as it was able to reduce some of the tests to 0 after a couple of executions.

Distractions

Distractions are everywhere. Some people say that Ron Paul is a distraction. Is Sarah Palin a distraction? Or maybe it was Hurricane Gustav. I say that the economy is a distraction.

The focus of the election has become the economy. The economy is important, right? For two years in college, I actually double-majored in economics. If I wouldn't have been so lazy during my senior year, I would have a degree in it. However, it is not the most important issue in this election year, at least not to me. That distinction is still the war.

Sometimes other libertarian leaning people question me for voting for Democrats. I always say that I would rather have my economic freedoms violated than personal freedoms. In one case I am broke, in the other I am in jail. I don't want to be broke, but I really don't want to go to jail. There are worse things than jail, namely death. U.S. foreign policy has been dealing out death in a big way over the last eight years. War is worse than any economic or personal freedom violations. Of course war actually cause these violations as well.

Look at the Patriot Act. Clearly a war-time measure that is one of the most egregious violations of personal freedom in the checkered history of the United States. Look at our budget deficit and how much money we are spending on wars. Go beyond that and look at the weakness of the dollar and the problems that is causing.

If you keep looking, you'll soon notice the price you pay for gasoline. How much did gasoline cost before we started waging war in Iraq? I know better than most that correlation does not imply causality, but what do you think the price of gasoline would be today if the United States never invaded Iraq?

If we gasoline was in the $2/gallon range, the deficit was a fraction of what it is currently, and the dollar was stronger, do you think the economy would be much of an issue at all?

There is a price to pay for war. We have tried to push all of that cost to our children in the form of budget deficits, but it has not worked. We are paying it at the pump. We are paying it at the grocery store. We are paying it when we buy "cheap" goods at Wal-Mart.

War is the most important issue. The only hope for less war is to vote for Obama. I wish Obama would pull all of our troops out of Iraq and not even leave behind any bases. I am frightened that he will expand military activities in Afghanistan and maybe Pakistan. He is not a perfect choice, by far. But in the interest of Country First, he is the only responsible choice that I can make.

JavaScript Benchmarks, now with Chrome

As promised yesterday, I did the JS benchmarks again on a Windows machine so I could include Google Chrome. I tried to be pretty inclusive, adding in IE7, IE8 beta 2, Firefox 3.0.1 (current release), Firefox 3.1 with and without JIT, Safari 3.1 (current release), Safari 4 beta, Opera 9.5 and Chrome. This was all run on my workstation, a 4-core, 3.2 GHz box with 8 GB of RAM. Any add-ons, extensions were disabled. Here is the pretty picture.


Once again Safari is the kind. Safari 3.1 beats everything except for Safari 4 beta, which crushes even Safari 3.1. Opera was a little slower than Safari. Chrome was generally comparable to the various Firefox browsers, but overall slightly slower. Like Firefox 3.1+JIT, it was very on error handling! Of course IE was the slowest by far, but at least IE8 is faster than IE7. Maybe IE8 is shipping with debug symbols included (as Microsoft has often done in the past) and the release candidates will be much faster than the betas. Or not.

Anyways, Chrome, and its V8 engine, does well, but does not seem to be ahead of Firefox and is certainly behind Safari and Opera. Maybe they can do better on the Mac!

Wednesday, September 03, 2008

More JavaScript Benchmarking

My old boss sent me this link about Google Chrome performance. It's a good read. It includes a link to an interesting JavaScript micro-benchmark. It included some interesting findings on Chrome vs. Firefox 3, Safari 3.1, and the new IE 8 beta 2. I was curious about some other browsers, namely Firefox 3.1 beta with and without JIT, Safari 4 beta, and Opera 9.5. Of course I made a nice picture of my results.


Interesting results. First off, FF 3.1 with JIT did not crash. It crashed so many times on me yesterday, that I was sure it would crash on this. Even though it did not crash, it was barely faster than FF 3.1 no JIT or FF 3.0.1. In fact, it was really only faster at error handling and the same on everything else. Apparently errors are easy to JIT for TraceMonkey!

Next, Safari 4 beta is fast. If you look at the link above, Safari 3.1 was already the fastest thing out there, so I guess this should not be a surprise. It crushed everything and it did it on the kind of tasks that real developers do a lot: array and string manipulation, regular expressions, and DOM manipulation (technically not part of your JS engine, but practically the most important test.) I am not used to seeing Opera lose when it comes to any kind of benchmark. If you throw out the array manipulation, it and Safari are pretty close.

I will have to boot up Parallels and try out Chrome vs. Safari 4 beta vs. FF 3.1 beta on Windows.

Tuesday, September 02, 2008

Firefox 3.1: Bring on the JIT

Web developers everywhere are excited about Firefox 3.1. Part of that is because of CSS improvements, but the big reason is because of TraceMonkey. This a JavaScript engine with a JIT that uses trace trees, a pretty clever technique to turn interpreted JavaScript (read slow) into compiled native (read fast.) JIT is a big part of why VMs like the Java VM and the CLR are very fast, in general much faster than VMs that do not JIT like in Python, Ruby, or (until now) JavaScript. It is why JRuby is faster than Ruby. Thus the prospect of making JavaScript much faster is very exciting.

Recently I had done some micro-benchmarking of JavaScript performance vs. ActionScript/Flash performance. This concentrated on XML parsing only. Now the ActionScript VM is a JIT VM. In fact, Adobe donated it to Mozilla and it is known as Tamarin. It has been Mozilla's intention of using this for JavaScript in Firefox for awhile, as JavaScript is essentially a subset of ActionScript. TraceMonkey is based on Tamarin, but it adds the trace tree algorithm for picking what to JIT. The trace tree approach allows for smaller chunks of code to be JIT'd. For example if you had a large function, like say a single script that runs when the page loads, then with a traditional JIT you either JIT the whole function or not at all. Now what if that function has a loop that runs dozens of times, maybe populating a data table for example. With a trace JIT you can JIT just that one critical loop, but not the whole giant function. So it should be an improvement over Tamarin and thus ActionScript. Of course there is only one way to tell...

So I repeated the same XML parsing tests that I did for Firefox 3.0 and Safari 4 (beta). First, I had to enable JIT in Firefox. One of the links above describes how to do this (open about:config in FF 3.1, look for the jit.content option and set it to true.) I restarted FF 3.1 just to make sure this took effect. I then ran the tests. The results? Not much difference between FF 3.0 and 3.1b+JIT. FF 3.1b+JIT was about 4% faster, which is probably statistically negligible. It was still 6x slower than ActionScript and almost 3x slower than Safari 4.

So what went wrong? Not sure. Here is the code that gets executed in my test:

function load(){
var parser = new DOMParser();
var xml = {};
var start = 0;
var end = 0;
var msg = "";
var results = document.getElementById("result");
var li = document.createElement("li");
initReq();
req.open("GET", "LargeDataSet?size=50", true);
req.setRequestHeader("Connection", "close");
// use a closure for the response handler
req.onreadystatechange = function(){
if (req.readyState == 4 && req.status == 200){
msg = "XML Size=" + req.responseText.length;
start = (new Date()).getTime();
xml = parser.parseFromString(req.responseText, "text/xml");
end = (new Date()).getTime();
msg += " Parsing took: " + (end-start) + " ms";
li.appendChild(document.createTextNode(msg));
results.appendChild(li);
}
};
req.send(null);
}

Pretty simple code. I manually execute it 20 times. It would sure seem like it could be JIT'd. What gets timed is just the parser.parseFromString(...) call, where parser is a DOMParser. Maybe that object cannot be JIT'd? Maybe there is a bug with the JIT that will be resolved in the future? It does seem to suggest that TraceMonkey may not always be the slam dunk everyone expects.

I was surprised by the results. I thought that FF3.1 would be faster than FF3. I didn't think it would be faster than ActionScript in this case, but I thought that it might be close. In many other cases, I expect ActionScript to still be much faster than TraceMonkey. Why? Well there is one other ingredient in VMs like the JVM and CLR that make them fast: static typing. This allows the VM to make a lot of other optimizations that work in combination with JIT'ing. For example, knowing that a particular variable is a number or a string allows the VM to inline references to that variable. This can eliminate branches in logic (if-else statements, where maybe the else is not possible.) The JIT can then take place on the simplified, inlined code, and be about as fast as possible.

If you read about some of the techniques used in TraceMonkey, it tries to do a lot of the above via type inference. So in some cases TraceMonkey and the AVM2 (ActionScript VM) may be able to do the same level of optimizations. In fact, given its tracing approach, TraceMonkey may be able to do better. But I am guessing that there will be a lot of situations where AVM2 will be able to do more optimizations just because of the extra information it has at its disposal in the form of static typing.

Sunday, August 31, 2008

Recent Other Writings

Last week, IBM published an article I wrote on using JRuby on Rails with Apache Derby. It concentrates on rapid prototyping/development. I didn't get too heavily into the IDE side of things, but when you add RadRails into the equation it really is nirvana-ish development. Very fun.

I've also been writing a lot on InformIT about Java Concurrency in Practice. I did some fun stuff over there too, like try to turn some Project Euler code into parallel code. I guess technically that succeeded just fine, but is a good example of when parallel code is not any faster. In this case, the algorithm was CPU bound anyways. Even having two cores didn't really help much. Oh well. I treated it like a strength exercise back when I took piano lessons.

Thursday, August 28, 2008

Search Twitter from Flash

I have updated the Twitter ActionScript API. I added support for search. You are probably aware that search is provided by Summize, who was acquired by Twitter. It is pretty obvious that the APIs have not yet been merged!

Twitter's API is all based on Rails ActiveResource ... which is awesome. It turns any resource (often a database table) into a RESTful service. REST is often associated with XML, but Rails (and thus Twitter) supports it as JSON (Twitter supports ATOM and RSS as well) too. For ActionScript, XML is great. Or I should say POX is great and that is what Rails serves up.

The Twitter Search API is different. It supports two formats: ATOM and JSON. No POX. I went with the ATOM format. For JSON, I would have used Adobe's corelib. It works well, but I didn't want to add the weight. Plus, JSON tends to parse much slower in AS3 than XML. That is because AS3 implements E4X. To get E4X to work with ATOM, you have to be mindful of namespaces. For example, here is the code I used to interate over the entries: for each (entryXml in xml.atom::entry). Here the xml variable is the parsed result from search.twitter.com and atom is a Namespace object. Not as pretty as just xml.entry, but oh well.

Sunday, August 24, 2008

Parsing XML on the Client: JavaScrpt vs. ActionScript

Developer: Wow the JavaScript interpreter on Firefox 3 is awesome, but the one on the new Safari is even better. It is a great time to be a JavaScript developer!

Me: It is still a lot slower than ActionScript.

Developer: Oh don't quote me old numbers, the new browsers are so much faster.

Me: Still slower than ActionScript.

Developer: Maybe slower at doing useless things like calculating prime numbers. Who would do that in a browser anyways? The new browsers are fast at doing realistic things.

Me: But still slower than ActionScript.

Developer: Show me some proof on something realistic, like parsing XML coming back from an Ajax call.

Me: [Whips together a servlet for producing huge chunks of XML and some JS and a SWF for calling it and doing a DOM parse.] Alright let's see the results of these tests...

Everything is O(N), as you would expect, and can verify by doing a linear regression of XML document size vs. parse time. Safari 4 is much faster than Firefox 3, the ratio of their slopes (FF3/S4) = 2.95. But they both lose badly to ActionScript 3 (running Flash Player 10, RC2), (FF3/AS3) = 6.36 and (S4/AS3) = 2.16. Maybe IE can do better, should we give it a try?

Developer: Now you are being a jerk.

Thursday, August 21, 2008

Automounting a Drive in OSX

One of my colleagues had an interesting question for me. We needed to auto-mount a Windows drive from a Mac. The Mac was being used to automatically create screenshots of web pages on various Mac browsers. It needed to then upload the screenshots to a shared drive. Thus mounting the drive and just doing a copy seemed like the easiest way to go.

Mounting a Windows drive is very easy with a Mac. Just go to Finder -> Go -> Connect to Server -> and then enter smb://some-windows-machine. Automator seemed like the way to go here. I had actually never used it, but it proved quite easy. Here is what it looked like for me:
As you can see from the screenshot, I used the Actions -> Files & Folders. I first selected the Get Specified Servers action and added the same URL that I normally used to manually mount the drive. I then added a Connect to Servers action. You will want to test it once, so that you can submit your credentials while making sure to add the credentials to your Keychain. Next, you'll want to do Saves As and change the format to Application. Now to get to execute the Application automatically at startup, go to System Preferences -> Accounts -> Login Items and then browse to wherever you saved the application. Re-boot and that's it!

Tuesday, August 19, 2008

Cache Discussions

My post about using MySQL for caching got picked up by reddit and viewed a few thousand times. It sparked some discussion, but unfortunately it has been spread out on a few different sites. So I decided to aggregate them here.

"Here's a big reason to use MemCached: expiry!
Let's say you only want to do a complicated query once every fifteen minutes. Do it once, put it in a cache by key with an expiry of 15 minutes. Let memcached worry about when to take it out for you."

Yes, this is a good reason to use memcached. I used this pattern for the aggroGator, with Google's version of memcached. Which reminds me that Part 3 of the series I wrote on GAE is out... Anyways, in that app, RSS feed results are cached in memcached with a five minute poll (initiated client-side, so only polling for logged in users.)

Expiry is the cache eviction policy for memcached, where as any database cache is going to be more of an LRU policy. There will be cases where expiry is more useful, but I would actually guess that LRU is appropriate for the majority of use cases...

"MySQL memory table is not as fast as memcached. Depending on your data, memcached is 3X times or more as fast for get/set (select/insert)."

Really? I would love to see some objective results for this. Of course it would have to be an apples-to-apples comparison. The data would need to be retrieved from a cache node on separate physical machine and for the MySQL cache, it would need to be a select by primary key. Now I wouldn't be terribly surprised if memcached was slightly faster, but 3X? I would be even more surprised if a put/insert was faster at all.

"This has been said many times. MySQL and memcached serve different purposes. Memcached is used to store processed data, while MySQL generally contains raw, normailized data, which needs lots of complex queries and other processing."

I actually mentioned this at the end of my post... So obviously I agree. But I have a feeling that people use memcached to cache a lot of data that is not very processed at all. Also, the last line is very misleading. You do not need to do much normalizing of your data. I can tell you that anybody doing federated database systems have to do a lot of de-normalizing of their data. And complex queries and other processing? That is just silly.

"Facebook needs memcache for the obvious reason that it's pages are highly complex and include many pictures."

Eh? Don't see how pictures would matter... But if Facebook is using memcache for HTML fragments, then I would agree that this is the right kind of cache. I don't know if this is the case or not. Other things like my list of friends or my contact info would be a poor choice for memached. Something like the Facebook feed... That is a lot tougher. There are limits to what you can cache, since the feed changes a lot and you might have a low tolerance for stale data. You might be able to create HTML fragments for the stories and cache those?

" Also, fewer of Facebooks pages are time-critical when compared to eBay. On eBay you basically can't cache a page rendering (memcache) if it has less than a minute
of auction time left"

Item listings are certainly time-critical, i.e you expect the price to be accurate when you are looking at a listing and considering bidding on it. This is true regardless of the time remaining, being less than a minute doesn't matter too much. However, that is just one page, many other pages are not so sensitive, but they are very dynamic.

When it comes to picking between MySQL and memcached, I would first say: are you using an ORM but need caching? If the data is being accessed through ORM, then your cache layer should be a database, not memcached. Again the only exception I could see to this would be a graph, i.e. data that is hard to describe relationally (requires self referential foreign keys, etc.)

Thursday, August 14, 2008

Cache Money

Scalability is a hard question, but a lot of people think that scalability is all about caching. In particular, memcached is the answer for caching. I think we can blame Facebook for this. Everybody knows that Facebook makes heavy use of memcached. Terry says that social graphs are a scalability problem for databases that is solved by memcached, so he is clearly drinking the Kool-Aid. The benefits of caching are obvious, but is memcached really the best/only way?

Earlier this year, eBay won an award from MySQL. This was for application we built that we originall Gem Cache. It is a caching tier that is built on top of MySQL. When the caching tier was designed, memcached was given a lot of consideration, but there were some very good advantages we got out of using MySQL instead.

First off, can MySQL be as fast as memcached? Absolutely. MySQL is aggressive about keeping things in memory, and if everything is in memory, it will be as fast as memcached. You can use MySQL's MEMORY engine, to accomplish this, or you can stick with MyISAM and let MySQL's caching put things in memory for you. Obviously you need to split your database, but we already knew how to do that efficiently. With that in mind, here are the advantages that MySQL offers.

1.) SQL Semantics. You are not limited to just simple "put" and "get." You can do selects and joins, aggregates, etc.
2.) Uniform Data Access. Do you use some kind of ORM? You can leverage this with a MySQL based cache.
3.) Write-through Caching. In a typical memcached setup, updates are still done to the database and this invalidates one or more objects in memcached. With a MySQL based cache, a row in the cache corresponds to a row in the "real" database. So we can write the cache and then asynchronously update the system of record.
4.) Read-through Cache. Similarly, you can always attempt to read from the cache database. If there is a cache miss, you can invisibly read from the real database and add to the cache at the same time.
5.) Replication. MySQL allows for replication of data, so it is easy to add redundancy and fail-over to your cache. Replication can also be useful when you have multiple data centers.
6.) Management. There are lots of great management tools for DBA, operations folks, etc. to use with MySQL.
7.) Cold starts. When your cache is a copy of database rows, it is easy to bootstrap it from your source, since the source and the cache are so similar.
8.) Eviction. Memcached gives you basic expiration, but otherwise you are handling eviction yourself. The caching in MySQL is a more useful LRU policy.

So there, just a few obvious advantages to using MySQL as a cache instead of memcached. Now I know that a lot of folks use MySQL as their "real" database, so it may seem weird to use it as a cache as well. But they are probably (hoepfully) using InnoDB for the "real" DB and that really is a different beast than MySQL with MyISAM or MEMORY tables. And it's not like you have to pay for extra licenses are anything... What are the advantages of memcached over MySQL? The only obvious one to me is if you want to cache things that don't fit in the database, like deep object graphs or HTML fragments, etc.

Tuesday, August 12, 2008

Netgear WNR2000

My trusty old D-Link DI624 started having problems recently. Actually it only started having problems immediately after a Comcast technician switched out our cable modem. Coincidence? Probably, but whatever...

I knew my Macbook was supported "Draft 2" of the 80211.n standard, so when I saw a reasonably priced 80211.n router, I went for it. "It" was a Netgear WNR2000. It installed very easily. I was able to re-use my old SSID and security settings, so that I did not to change any of my devices (two laptops and a Wii) or devices of friends and family who had previously used my network. Very nice. All of my devices accessed the new network with no problem. Happy happy joy joy! Not so fast.

Everything worked great except for my Macbook. It had no problems with the network, but its Internet connection was horrible. It was like being on dial-up, and it was only this way for my Macbook. My wife's laptop was blazing along as was our desktop computer (with a wired connection to the WNR2000.) It was only my Macbook that was badwidth impaired.

I started tweaking with the WNR2000's settings, well actually just one (wireless) setting: maximum network speed. This was set to 300 mpbs, or half the theoretical maximum for 80211.n and nearly six times as fast as my old DI-624's 80211.g network. I started tweaking it down, but to no effect. Until I set to 54 mpbs, i.e. the same speed as you get with 80211.g. Then my Internet connection on my Macbook was as fast as it was for every device on the network. Order had been restored, but it is not a satisfying solution.

My only guess is that I fell victim to some kind of "mixed" network issue, but that is mostly a guess. I thought the 80211.n would come in handy when copying files between my desktop computer and my Macbook. I do this a lot for music, photos, and videos. Now I basically have the same wireless network speed as before, but I could have gotten that with a cheaper router.

Friday, August 08, 2008

Blogging Tools for Programmers

What kind of tools do you use for blogging? I am writing this, and most of my posts, in Blogger's web interface. I have tried a few other tools, but none of them are very good. There are some basic things that I want out of a blogging tool:

0.) Obviously it has to work with Blogger.
1.) Rich formatting. If nothing else I need to be able to easily create links. Full WYSIWYG editing would be great though. Don't make me do manual HTML formatting, but please don't prevent me either.
2.) Image hosting integration. I would like to be able to take either images on my local computer or off the web to include in the blog.
3.) Blogger tags/labels. I have a small blog, with about 100-150 unique viewers daily (as measure by MyBlogLog, which seems reliable.) There may be more folks who use blog reading tools, too, who knows. A decent number of the views come from people doing Google searches for "blah". These searches often lead to blog posts that I have tagged as "blah." No tagging means less visibility, so forget that.
4.) Offline mode would be nice. Really the first three thing are handled pretty well by Blogger's native web interface. But it would be nice to able to compose a post while offline.
5.) OSX. I blog on my MacBook almost exclusively and I do not want to boot Parallels just to blog. Integration with OSX spellchecker is pretty much implied too.

Does that seem like so much? I don't think so, but yet I have not found an acceptable solution for this. Seems like most desktop apps are designed for WordPress, TypePad, and Movable Type. I have tried things that are supposed to be great, like Mars Edit, and was underwhelmed. So even though nothing quite satisfies the above simple requirements, what I would really like is something that also supported:

6.) Code. I like to include code in my posts. It would be great to have something that made it easy for me to copy-n-paste code. It should escape special characters for me (like greater than and less than signs), provide code highlighting based on the language, and provide scrolling. Right now I use a PHP highlighter from Gilly. I had to hack in the CSS for this in one of my sidebar widgets. It works ok, but is a bit manual and the code often overflows.

Is there a tool like this out theere and I just don't know it? Is there a tool out there that does all of this but only with one of the other blogging platforms? If that were true, I would have to figure out how much pain would be involved with migrating... Maybe I should just build this using AIR?

The Wrong Color of Green

So apparently the folks in Green Bay didn't listen to me about the best way to resolve the Brett Favre situation. It's not like they are going to get a first or second round pick now, for the simple reason that Jets stink. Whatever. I'm glad we've still got Lilly. Anyways...

I am not happy about Favre playing for the Jets. Of course, I think I know Green Bay's strategy here. They know that Favre will have to play against New England twice a year for as long as he stays unretired. They know that Belichick is an evil genius who likes nothing better than to cause his opponent's mind to implode.

Last year we learned about another little fetish of Belichick: football video. Think of all of the video of Favre that the Packers have accumulated. Think of all of the other goodies that might be laying around, like Rorscach test results, etc. Now imagine all of that in the hands of Belichick... Ted Thompson may just get the last laugh.

Update: Looks like trickle down economics works after all as Miami has signed Jets castoff Chad Pennington.