XML isn't a panacea, especially if the semantic integrity of data hasn't been maintained properly Over the past few weeks, InfoWorld has been engaged in an epic IT battle against the forces of business evil: a mountain of data combined with mutant business processes that were the result of staff molding their work habits to inflexible systems that boxed them in. In our case, it was the implementation of a content management system, but it could have just as easily been an electronic trading system at a financial services company or a new fulfillment system for an online retailer. To a large degree, IT is about a few very simple things: moving data around and giving people systems that help them act on data, make sense of data, and ultimately add meaning to data before passing it down the chain. So the ability to move data around quickly is key. Like many of you, we depend on “legacy” systems to do it, and that means any new system must interface with the old ones in the proper ways to be effective. It has been five years since the XML 1.0 spec was released in February 1998, so anyone who has been “doing XML” during that time is in the pleasing position of having beautifully clean XML to migrate from their legacy systems into their new ones. At InfoWorld, we had thousands of XML documents from the past three years that made it a snap to migrate content into our new system.If only that were true.If you look at an XML FAQ ( https://www.ucc.ie/xml/faq.xml ), one question is, “Why is XML such an important development?” Part of the answer is that it removes constraints that Web developers previously dealt with, one of which was the “dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for.” This is unquestionably true, but I’ve observed an interesting phenomenon as we approach XML’s five-year anniversary. As XML has infiltrated the enterprise, it too has been abused, neglected, and misunderstood. At InfoWorld, we started our data migration project with high hopes, approaching our mother lode of XML data with the tools that any self-respecting 21st century developer would use: Java and XSL. It was all “in XML” — how could we lose? In the end, we shuffled away from the XML scrap heap with heavy hearts and a mountain of one-off Perl scripts that got the data migration job done. We prevailed, but ultimately it was what you hear some football coaches call “winning ugly.” If XML holds such promise, how could something like this happen at a place such as InfoWorld, where we’ve had a front-row seat for the emergence of XML-based standards? No one intended for our XML data to grow unwieldy over the past few years, but it did. It takes a lot of hard work and attention to maintain the semantic integrity of the data represented in your XML, as your business morphs and changes and new people come along to touch and manipulate the data in different ways. It’s particularly difficult when you’re converting data created by people, ensconced in the daily ebb and flow of messy human life, into a machine-readable format intended for the ages. Data validation is important and should be encouraged and practiced, but like security, only insofar as it allows people reasonable freedom to do their jobs. The problem goes back to the simple adage: garbage in, garbage out. XML is only meaningful if you insist on it from the beginning and throughout the life of your data. If you allow the fact that your data is “in XML” to lull you to sleep, be prepared for a rude awakening (and a lot of Perl hacking) later. Software Development