Microsoft Office 11's XML capabilities contain the seeds of a revolution in enterprise content management JEAN PAOLI, XML architect at Microsoft, is a man on a mission. A former developer of SGML tools, he joined Microsoft in 1996 and co-edited the first XML specification in 1998. All along, he has dreamed of building software that would make it easy for ordinary folks to create, edit, and analyze structured and semistructured data. Now, finally, his vision is coming into focus. The first public beta of Microsoft Office 11 demonstrates, as promised, that XML has become a native Office file format. What’s more, Word 11 and Excel 11 can associate documents with data definitions written in XML Schema, and they can interactively validate documents against schemas. These are transforming achievements. Previous Office upgrades have been yawners, but version 11 should rivet the attention of IT planners. We’ve known for many years that most of our vital information lives in documents, not databases. XML was supposed to help us capture the implicit structure of ordinary business documents (memos, expense reports) and make it explicit. Sets of such documents would then form a kind of virtual database. The cost to search, correlate, and recombine the XML-ized data would fall dramatically, and its value would soar. It was a great idea, but until the tools used to create memos and expense reports became deeply XML-aware, it was stillborn. XML did, of course, thrive in another and equally important way. It became the exchange format of enterprise databases and the lingua franca of Web services. Now Office 11 wants to erase the differences between XML documents written and read by people using desktop applications, and XML documents produced and consumed by databases and Web services. This is a really big deal. The first beta of Office 11 doesn’t include any demonstrations of the new XML features, but the Office team put together some examples for us, and Jean Paoli talked us through them. We started with a rÈsumÈ template written in Word 11. Today we use such templates mainly to control the appearance of documents. If we also want to control their content, we can ask developers to write macros that enforce business rules. In principle, a company could publish a rÈsumÈ template that would, for example, require job seekers to describe past experience in terms of a controlled vocabulary. In practice, that rarely happens. Procedural code to enforce such constraints is hard to write and even harder to reuse. With Word 11, you can attack this problem by defining a schema and mapping its elements to a rÈsumÈ template. In the rÈsumÈ example, we associated a schema with a sample rÈsumÈ, using the Templates and Add-ins dialog. A new task pane called XML Structure then appeared, displaying a single root element named RÈsumÈ. We selected it, and chose the option Apply to Whole Document. Now subelements named Objective, Experience, and Education appeared in the task pane. Mapping these to regions of the sample rÈsumÈ revealed deeper structure until the entire schema was finally mapped. Another example illustrated the same scenario for Excel. Here, the fields defining an expense report were captured in a schema, then mapped to an expense report. Once we saw how it worked, we were able to apply the same concept to our existing InfoWorld spreadsheet. After writing a simple schema, we dragged elements from the XML Structure pane onto the spreadsheet to bind named schema elements to numbered cells. Office 11 doesn’t help you write your schemas. That is both a science and an art, and something that few outside the XML development community have attempted. But once you have a schema, no programming skill is needed to bind it to a document or to enforce the constraints expressed by the schema. In the rÈsumÈ example, those constraints were trivial: A user of the document who typed nondigits into the YearFrom or YearTo elements would be alerted and could not save the document until these elements were written as the integers required by the schema. But this humble example has profound implications. Consider the InfoWorld story shown in the screen shot. It’s written in Word but backed by a schema that enumerates the set of allowable author names, limits the length of headlines and of the main story, and disallows Greek symbols. The story as shown violates two of those constraints: It includes a Greek letter and the author’s name, misspelled, fails to match the enumerated set of allowed names. Word 11 reports the infractions as they occur and stops complaining as soon as they are corrected. Once valid, the document can be saved as XML in two ways. The default is to create WordML, which preserves Word’s styles and formatting in an XML name-space that’s separate from the one bound to the schema-controlled data. You can optionally save through an XSLT transformation which, in a publish-to-the-Web scenario, could translate WordML formatting into HTML/CSS formatting. Alternatively, if you tick the Save as Data option, you can instead save just the raw XML data. In that case, you can bind one or more XSLT stylesheets to the document, each of which can generate WordML styles and formatting. The XML expertise needed to create schemas and XSLT transformations is scarce today. Once Office 11 hits the streets, its mainstream applications could arguably commoditize those XML skills more quickly and broadly than have Web services technologies. What’s more, Office is positioned as a bridge between the worlds of desktop applications and Web services. In the emerging architecture of the business Web, XML-wrapped remote procedure calls are giving way to XML documents. SOAP, we’ll soon see, isn’t just a way for services to talk to one another. A purchase order acquired from a Web service by means of a SOAP call will sometimes need to be modified by a person. The application used to edit that purchase order will have to be a familiar tool. It will also have to guarantee that the document it passes along contains well-structured, valid, and thus enterprise-ready data. Office 11 appears to meet both of these requirements. And it does so in ways that respect the inherent strengths of the applications. Displayed in Word, an electronic purchase order can reflect its paper-based legacy by exploiting Word’s formatting power. Instances of that same document, brought into Excel, can feed the analytical functions that are Excel’s specialty. When XML data has a regular structure that maps naturally to a grid, Excel 11 can make that data immediately available for columnwise sorting, charts, and pivot tables. Here, in fact, is a case where Microsoft has put XSLT’s basic XML-shredding capability into the hands of a nonprogrammer. Absent a schema, Excel 11 can still infer structure from raw XML data. When we pointed it at an XML data dump taken from a back-office system, it automatically proposed a structure. We were then able to populate a spreadsheet template with selected elements, reorder them at will, and define a mapped region into which a subset of our data could be imported. We previously had to write XPath expressions to target elements and XSLT code to rearrange them. Excel 11 makes that an interactive task that any user can perform. Jean Paoli is wildly enthusiastic about what all this will mean. We share his excitement. Empowering ordinary users to create and interact with XML data is a huge step forward. It’s too bad that Outlook hasn’t been given the same treatment as Word and Excel. Most of us do a lot more communicating than document processing or number crunching. We’d like to see e-mail become a natively structured and manageable data type, too. Meanwhile, we’ll have our hands full just exploring the new vistas opened up by the XML features of the new versions of Word and Excel. Software Development