Tuesday, November 03, 2009

Persistent Data Structures

A few weeks ago I blogged about concurrency patterns and their relative tradeoffs. There were some really poor comments from Clojure fans -- the kind of abusive language and trolling that reminded me of Slashdot in its heyday. In fact, for the first time in a very long time, I had to actually delete comments... A lot of people get confused by the fact that I used Clojure as a representative for software transactional memory, and thought that I was talking about concurrency in Clojure. Anyways, I was flattered that Stuart Halloway commented on my blog. He said I needed to watch Rich Hickey's talk on persistent data structures. So I did. I wanted to embed it on my blog, but it didn't look like InfoQ supports that. So I stole his slides instead:
A little after all of this, I learned that persistent data structures were coming to Scala. So a lot of folks seem to think that these are pretty important. So why is that exactly? It was time to do some homework...

You can read some basic info on Wikipedia. You can go to the original source, Phil Bagwell's paper. The latter is very important. What's amazing is that these are relatively new ideas, i.e. the science behind this is less than ten years old. Anyways, I would say that the important thing about persistent data structures is that they are data structures designed for versioning. You never delete or for that matter change anything in place, you simply create new versions. Thus it is always possible to "rollback" to a previous version. Now you can see how this is a key ingredient for software transactional memory.

Going back to the original ideas... Bagwell's VLists are the foundation for persistent arrays and thus persistent hashtables in Clojure. These form the foundation of Clojure's Multi-Version Concurrency Control, which in turn is the basis of its STM.

These are not just persistent data structures, they are a very efficient implementation of the persistent data structure concept. Each incremental version of a list shares common data with the previous version. Of course you still pay a price for getting the built-in versioning, but this is minimized to some degree.

Scala fans should also be interested in persistent data structures. They are coming to Scala, which is supposed to be available in beta by the end of this month. The aforementioned Bagwell did his work at EPFL -- the same EPFL that is the epicenter of Scala. Indeed, Bagwell himself has contributed to improving the implementations of VLists in Scala. With VLists in tow, Scala STM is just around the corner.

2 comments:

Rich Hickey said...

Hi Michael,

Thanks for covering Clojure. I'm sorry you had some inappropriate comments first time around, that's not indicative of the Clojure community in general.

A clarification - Clojure hash maps are not based upon Bagwell's VLists, but on his Array Mapped Hash Tries:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.6279

Also the link to the presentation video should be:

http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey

Regards,

Rich

Simon Harris said...

FWIW, I've created a number of persistent data structures for Ruby: http://github.com/harukizaemon/hamster