Wednesday, May 04, 2005

XML Databases
I came across an interesting article on Slashdot about the future of databases. There was a lot of stuff in this article, but one thing that caught my attention was the mention of XML and databases. This is of particular interest to me in my work. There are a lot of "XML database" products out there. Many smart people consider these to be trendy junk-ware, just trying to capitalize on the popularity of XML. Many people consider XML itself to be a trend. That's a big story itself, but it seems to me that XML has only gained more and more popularity/use for about seven years now. That would seem to be unusual behavior for a trend.

Anyways, here's what I see when it comes to XML these days. XML has taken over as the de facto standard for exchanging information. EDI is not gone, but definitely fading away. XML is really good for exchanging information. The mistake a lot of people make is that XML is that they think that XML is also good for storing information. That is not really true. It is only as good as any other text file format (maybe not as good as some, but that's not relevant for this discourse.) If all you need to do is read some information in, text files (including XML) are good for this. If you only need to change the data infrequently, text files are OK. If the amount of data you need to store is small, again text files are fine. It's when lots of people need to start reading and reading in complex ways, as well as writing to the data that text files run out of usefulness. That's why we have pseudo-RDBMS systems, a.k.a. databases (they're not true RDBMS but that's yet another story.) Relational models and mathematics allow for wonderfully complex manipulation of data.

So XML databases are a sham, right? Well maybe, but maybe not. If I am a business, there's no way that I should store my data as XML. It's just not an efficient way to gain access to my data. An XML database does not change the equation here. If I need to exchange data with partners, then I will probably use XML for that. I'd prefer not to take my data and stuff it into XML, but that's probably the easiest way for my partners to accept data from me. Similarly, I can't expect my partners to send me data that is "formatted" for my database, so I will probably prefer XML from them. This still does not change my equation. I will still have to parse what they send me and store in a relational DB to gain maximum use of the data. So there is no need for an XML database.

That does not mean there is no need for XML database technology. Again, XML is great for data exchange, but the people at either end of the exchange are still going to want to store their data in a relation DB to get the most out of it. So they don't need an XML database. However, there are often "things" in between the data exchanging parties. These "things" are processes. If "something" must be done to the inbound data before it is ready to become part of my database, then that something will probably prefer to deal with the XML directly. After all, it is just an intermediate. It is not going to store the data long term, so it does not necessarily gain any value from putting the data into a relational database. It's also a fact that often these processes involve manual operations and human interaction. So the XML may need to stick around for awhile (persisted) and may need to viewed by people (queried.) Now that sounds like a time when I would need an XML database.

So if I was an engineer at Oracle, I wouldn't feel a pressing need to add robust XQuery capabilities to Oracle 11 (or whatever comes next for them.) Of course, they actually have added some of these abilities, but I think that's just because of the XML "buzz." Oracle is certainly not open source, but I would still guess that they are mapping things to their relational model and leveraging their existing technology. Anyways, the point is that XML database technology is never likely to be something needed by most companies, even if most companies use XML for data exchange. It does have its place, and that's around business processes. That could be a big market, but it will be tiny compared to the overall database market.

No comments: