Friday, September 01, 2006

XSD

My last post reminded me of a comment a former colleague of mine once made. He accused me of being "anti-XSD." I found this an unusual statement since I have written dozens of XML schemas, maybe more than a hundred. I have made good use of JAXB for generating parsing/binding code based on an XSD from when it was still a proposal up until the present annontation-based versions. So I don't think it's that I'm anti-XSD. It's that I'm anti-XSD proponents, or at least some of them.

You see when I meet somebody who is a huge XSD fan and want to use it to describe the world, I think to myself "I know who you are." These are the same people who used to want to describe the world using relational database schemas. They used to say "build all the data validation into the database, it is faster and better and it has to be right anyways." When they realized that SQL schema definitions were unable to properly describe all real-world data, they turned to stored procedures.

I see the same thing happening with XSD. It was obvious that DTDs were insufficient to describe everything, so XSD was born. It is several orders of magnitude more complex than DTDs, but it still cannot describe everything. It does not allow for dependencies between data. Thus the datatype of one element or attribute cannot depend on the value of another element or attribute (or on the values of four elements and two attributes, etc.)

I've long mused that the existence of the "xs:any" data type was partially because of this shortcoming. I've just picked on of XSD's shortcomings, but that's enough to see that at the end of the day neither XSD and SQL can actually describe the whole world.

So how oh how can we write software for real-world problems when these data modeling languages cannot describe all real-world data? Ahh, that's why we have programming languages. That's why the database guys started pushing stored procedures with their own languages like PL/SQL. Heck, Oracle eventually even put a JVM inside their database so you could write Java for your stored procedures. I wouldn't be shocked to see some kind of language added to XSD.

The truth is you don't need it. There are plenty of programming languages already out there. They can even be used to allow for a declarative approach to data definition. Maybe one of these declarative approaches could become "standardized" or somehow baked into XSD. Maybe one day web browsers (and databases, etc.) will become smart enough to understand "super-XSD" and "middleware", and the programming languages used to build it, will shrink away. I have my doubts though. Those database guys never really pulled it off, and they've been trying since the 80's.

No comments: