Enabling users to use XML properly Next week’s issue of InfoWorld includes an article on the new XML capabilities of Office 11. While researching the story, I interviewed the architect of XML in Office 11, Microsoft’s Jean Paoli, one of the primary co-creators of XML. Here are some of his remarks. On XML developers versus XML users: Paoli: “Our goal is to enable end-users to use XML properly. The result is not for developers. We are for the masses. That’s why we had to invent this mode where the XML guy or Web developer would be able to introduce a schema into something an end-user can understand, a template, and then the end-user is going to use that template. I know that in general the XML developers do not think about this kind of thing. But we are not here to enable XML developers to be happy creating XML. We are putting XML generation into the hands of people who do not understand XML at all.” It’s about time! I’ve attended too many XML-oriented shows and conferences where the standards debated and the products demonstrated have no connection at all to the end-user. This becomes especially ironic when Tim Berners-Lee’s vision of the Semantic Web is invoked. Where is all that rich metadata supposed to come from? People will ultimately have to create the Semantic Web, and to do that, they’ll need the common everyday writing tools that can capture XML structure in powerful yet unobtrusive ways. On mining back-end data: Paoli: “When we started XML, I wanted to focus on the server side, because we needed infrastructure. We needed data. Now, it’s achieved. All the back ends support XML — ERP systems, SAP, Siebel can output XML. And now I can pop that data into Excel, directly. I’m the most thrilled by the overall Office vision, which is to enable end-users to create XML. But we don’t even have to wait for that to use all the analysis features of Excel on existing XML data. It’s a big gift to the XML community.” Agreed. Every database can now deliver query results as XML. Having done my fair share of analysis of such data in Excel, I can attest to the breakthrough that Excel 11 represents. Sure, shredding the result of a SQL Server “FOR XML” query is just a simple matter of XSLT programming. But that isn’t very productive. When you suck raw XML data into Excel 11, the XPath expressions that target elements are written for you, and subsetting and rearranging the data reduces to dragging XML-mapped elements onto the spreadsheet. It’s a huge win. On the right tool for the job: Paoli: “All our tools are XML editors now: Word, Excel, XDocs. But we shouldn’t think about XML editors, we should think about the task at hand. If I want to create documents with a lot of text, that’s Word. With XDocs, the task is to gather information in structured form. And with Excel, it’s to analyze information. We have this great toolbox which enables you to analyze data. We can do pie charts, pivot tables, I don’t know how many years of development of functionality for analyzing data. So we said, now we are going to feed Excel all the XML files that you can find in nature.” Leveraging the strengths of the tools was clearly the right way to add XML support to Office. In Version 11, Word and Excel can still do everything they used to be able to do, only now they can do those things with XML data. It’s a huge advance. However, I’m still hungry for XML authoring support in the tools that I spend most of my day using, and that you probably do too: the browser and the e-mail client. If the goal is to enrich as much user data as possible, the browser’s TEXTAREA widget and the e-mail client’s message composer are arguably the most strategic targets for XML authoring support. On naming versus addressing of data: Paoli: “To create the schema for your spreadsheet, first look at the information which is captured in that spreadsheet. Give names to the data. The data is about the user’s name and e-mail address, for example. I don’t want to call it cell 1, cell 2, or F1 or F11. The whole thing about XML is to give names to things which are in general not named.” The enterprise data dictionary always looked better in theory than in practice. That’s partly because there was no practical way to push the centrally managed data dictionary out to the edges of the enterprise. With Office 11, that’s now conceivable. On customer-defined schema: Paoli: “The goal is to unleash the Excel functionality on generic schema, on customer-defined schema. Who knows how to create a data model better than the financial or health care company who uses the data? Until now, it was very difficult to find a tool which lets you pour the data belonging to any arbitrary schema, and then, for example, chart that data.” Modeling XML data using DTD (Document Type Description) or, more recently, XML Schema, has been a fairly arcane discipline. Practitioners have included publishers seeking to repurpose content and Web services developers writing WSDL files for which XML Schema serves as the type definition language. But enterprise data managers have not, in general, seen much reason to model lots of data using XML Schema. With Office 11, Microsoft aims to rewrite the rules in a dramatic way. If every enterprise desktop can consume, process, and emit schema-valid XML data, the modeling of that data becomes a huge strategic opportunity. And the people who can do that modeling effectively become very valuable. Software DevelopmentDatabasesTechnology IndustrySmall and Medium Business