Showing posts with label stax. Show all posts
Showing posts with label stax. Show all posts

Saturday, October 20, 2007

Attributes vs. Elements

I was reading an article on developerWorks that I didn't write... It was about XML and Java. Sound like a tired, old subject? This had an interesting take: How the choices you make when writing XML influence your Java application code. One of the central themes was on using Attributes vs. Elements. One of the basic points was that using attributes leads to faster code. I decided to test this theory.

I took a pretty simple XML document. It was actually one that I had written recently for a real problem. I created two versions of the same document. One used elements exclusively. The other used attributes whenever it was possible. I then tested how fast it was to access two pieces of data. One was "shallow", i.e. near the root of the document. The other was heavily nested. In both examples, the data was an attribute in the attribute favored approach. I repeated the test over 10,000 iterations and tested it against three XML parsing technologies: the standard DOM implementation included with Java 6, using XPath with dom4j, and using the StAX implementation included with Java 6. For the DOM and dom4j techniques, I also examined the parsing time.

The results were a little surprising. I found no differences with attributes vs. elements for DOM. This was true for both traversing the tree and for parse time. I don't mean a negligible difference, I mean no difference at all. It was so surprising that I had to double check my code a few times. The big difference for DOM was that the code for the attribute favored approach was definitely simpler, which was one of the points in the developerWorks aritcle.

The dom4j story was different. It was slightly faster to parse the attribute document, but it was a bit faster to retrieve values on the element document. I was surprised by this, but the differences were very small, probably not statistically significant (I didn't test this, though.) The code was virtually identical, of course, since we were using XPath for the traversal. The dom4j was much slower than the DOM approach, which is again not too surprising.

Finally, the StAX tests showed faster results for the attributes document. There was a larger difference than in any of the other tests. This makes sense because you don't have to go as far in to the attributes document (a start element event contains the attribute data, but does not contain text child node) and there are less events fired in an attribute document vs. an elements document. For example, bar is three events, but is two events. Also, StAX was faster than either DOM or dom4j, as you would expect. The StAX code for the attributes document was also slightly simpler than it was for the elements document.

So if you're using DOM or StAX, you should definitely favor attributes over elements. It will be less code and in the StAX case, faster code. If you're running dom4j and XPath (or maybe XQuery) based navigation, then it doesn't matter as much and elements based seems ok. This really is important, as a lot of these "modern" RESTful web services are heavy on the elements format over the attributes format. This is doubly bad for web services, since there's obviously a much larger byte-cost on elements style documents.

Update: As request, I am attaching the source code I wrote for this little micro-bench. I tweaked it a little as I realized there was an inefficiency in one of my dom4j methods. This tweak made dom4j faster on the attributes document, which is more consistent with the rest of the results. To run the code, you need dom4j and you need either Java 6 or Java 5 plus a StAX implementation. I ran it on my MacBook under Java 5 using Sun's StAX parser.

Tuesday, August 07, 2007

New Articles on IBM

There's a couple of new articles that I wrote that are now available on IBM developerWorks:

The Geronimo renegade: Using integrated packages: Codehaus' Woodstox: This is an article all about Geronimo's StAX implementation, Woodstox. If you're an old veteran of XML parsing like I am, you have to love StAX. Woodstox is not only an excellent StAX implementation, it is a killer piece of software. We use it in a lot of places at eBay because it is so fast.

Use JavaScript to make your XForms more robust: This is a very cool article about mixing JavaScript and XForms together. XForms has become kind of a forgotten technology, which is a real shame. It has so much to offer. I think part of the reason it is forgotten is because people don't realize that it is standardized and integrated with other technologies. It plays nice. It's not some one-off technology that lets you do some cute things in a sandbox. This article shows how everything in the XForms world is accessible through JavaScript. It's not a read-only kind of access either. You can modify models and forms, etc. I'm planning on writing some more on XForms and how it integrates with a certain well known AJAX framework that is usually referred to as a three letter acronym...