Distinguished Engineer Mark Lucovsky talks about the power of schematized data DISTINGUISHED ENGINEER MARK Lucovsky was chief architect of Microsoft’s HailStorm project and has now turned his attention to the server group. But his interests remain wide-ranging, as evidenced by his recent conversation with InfoWorld Test Center Director Steve Gillmor. Lucovsky initiated a discussion around the ideas about XDocs expressed in a recent Ahead of the Curve column, and went on to discuss his enthusiasms for the power of schematized data to guarantee high-fidelity XML. InfoWorld: Do you agree with that? Lucovsky: I agree with parts of it. I think that we’re going to mislabel XDocs as “Oh, it’s a thing for forms,” while more importantly it’s a thing for generating correct XML content. Word is really good at building free-form text, right? So you write reports and things that don’t have a lot of externally visible structure. If you associate an XML schema with them, they can start having some externally visible structure. XML is really good at creating tabular content that’s primarily numbers and analysis of the tabular content. So it’s really good at building portions of budgets and rolling out information and stuff. What’s the tool that people are going to use to create semi-structured content, like you would find in XML trees? You could look at XDocs and say “forms are one way of capturing those semi-structured trees,” but with XDocs you can essentially insert a node in the tree and then populate it. You can bind the node of the tree to an external data source like a Web service. I’m not sure all that will be realized in the first go-around, but as a tool for creating this XML content and manipulating it, I think it fits in great. InfoWorld: It sounds like some of the ideas around the HailStorm project. Lucovsky: The HailStorm project at one level was about “let’s create some of these schemas and let’s expose content that conforms to the schema through a Web service interface.” And then [it was about] “Oh, let’s associate these blocks of related schematized things with identities and let’s bake a security model around that.” The only part that I think is relevant in XDocs is what if those sources of XML information did exist? How would we manipulate them, how would we generate that correct content, and how would we mash it around? I think XDocs is an interesting tool for doing that sort of thing. … It’s mainly being used today as an editor of content, but boy, you could really generate content and take some of that content from a Web service and stick it in a document and persist it and work with it offline, and then refresh from the Web service or leave it alone. So I think it’s the first step toward real dynamic documents described in XML, and you can have multiple views over it because you can look inside the data. InfoWorld: What are you doing on the XDocs project? Lucovsky: I’m just looking around at a bunch of different things right now and trying to make sure from a server and Windows perspective that we’re not too disconnected in these pieces of technology. InfoWorld: You’re basically the watchdog on the server group now? Lucovsky: Yeah, potentially. That’s my current role. I haven’t decided if that’s the best thing for me in coordinating some of these things that I see coming that are interesting. You [should] have a look at Yukon. InfoWorld: Are you working on that, too? Lucovsky: No, I’m just poking around with that, [but] that’s another piece of technology that is super complementary to some of this stuff. InfoWorld: In what way? Lucovsky: Well, the XML types in it. Now you have a way to store in the database these XML objects essentially. And now I can pull them out and feed them into a document or shoot them over a Web service. It’s very interesting what they’ve done there, and you can get at basically tuples [records] of XML through SQL without any loss of fidelity. That’s always the hard thing in these systems is if you assume that there was an XML schema and that you have a document that conforms to that schema, how do you toss it around through these layers of software without losing fidelity? Every time you take that and convert it into an object and back, you have the potential for a programmer to take a shortcut and screw something up. And there are constructs in an XML schema that don’t map particularly well to C# or CLR style objects. And so sometimes it’s a manual process going back and forth, and dropping name spaces, screwing up collections — mistakes can [be made]. Splatting out the XML into a relational database and then stitching it back together again is something that’s relatively straightforward when the schema is stable. But the typical XML schemas have extensibility baked in and dealing with that and preserving name space prefixes, is all hard stuff. InfoWorld: It’s beginning to look like Office is finally arriving as a real platform as opposed to an ad hoc, band-aided environment. Lucovsky: I don’t think it was ever a band-aided environment. I think what you’re really trying to get at is that when you can start putting externally visible structure on the free-form content, then you’ve allowed a new set of things to happen. Things that were possible before but required custom connectors. Let’s say that you’re a manager and it’s performance review time. And all of your employees are writing their goals for the next quarter or the next period. Your goals, as the manager, are really the sum of all of their goals. Wouldn’t it be nice to be able to go to their performance reviews, extract the performance goals for next quarter’s section, paste those into your review, and then summarize and say “These are my goals?” Well, to do that today with Office, you have to open the document and use the mouse and cut and paste it in. With an XML ability to navigate the documents by their external structure — you know it’s a perf review, you know perf reviews have the goal section — you can just XPath that out of the document and paste it right in, data bind that section of the document directly into your document. And [what] if you [could] extend it further and say “What if performance reviews were stored on a server that exposed an XML Web services way of accessing those documents, throwing those XPath expressions and the select single nodes, so you could build that tree as a tree of DOMs,” then you could do that against the server or from your local hard drive or from a third-party Web service, whatever. I’m excited about what can happen as we start moving down this path. InfoWorld: You were excited about HailStorm, too. Lucovsky: I was. I still think it’s the right thing to do someday. It’ll happen slowly over time. I don’t know if everything that we were doing will happen, but certainly you’re seeing the schematization, right? … I think that that will happen because of the tools and some of the momentum in the industry. People want to look inside and repurpose the data, and it’s critical for mobile devices and for using this stuff in lots of different ways. InfoWorld: It’s an interesting approach. You’re saying that [XDocs] represents a significant driver for getting content moving around the network? Lucovsky: Um hmm. But you know, Word does the same thing. It’s just it does it in a slightly different way for a slightly different kind of audience and kind of content. You can basically take the exact same schema and in XDocs you might use the schema primarily to fill out a form, whereas in Word you might use the schema primarily to generate more of a report-style document, right? … It’s the ability to repurpose that data and keep the fidelity and the schema intact the whole way that’s key. InfoWorld: So XDocs becomes a control panel for routing that information? Because doing that in Word has always seemed to fail, and the same with Outlook, because of its feature glut. Lucovsky: It could be. But I think that the key there is that you didn’t have the ability to associate a schema with the documents that you’re creating with those tools. As soon as you can do that, it doesn’t matter if the design palette is an Outlook pane or if it’s Word, as long as you can say “this content that I’m creating conforms to the schema, and I can verify at design time that the data is conformant,” it really doesn’t matter whether I’m typing in Word and producing compliant content or I’m typing in an XDocs form that might offer different capabilities; the end result is still correct content. We do this with Web forms today. It’s really the ability to schematize chunks of the design surface that you’re working with that’s the key thing. InfoWorld: I think there’s going to be a tipping point that’s going to come quite rapidly here, as this stuff goes out on a network. Lucovsky: Um hmm. The key is being able to generate the high-fidelity content that flows around, and I don’t think it’s all there yet. InfoWorld: What are the stumbling blocks? Lucovsky: I think generating it in your source code is the hardest thing to do right now. You see a lot of different techniques. People [are] string concatenating and building their content with static templates, so it’s conformant but it’s conformant because somebody typed in the right source code. But is it really strongly typed conformant? Are people using validating inserts to create their documents or are the coding tools that they’re using really geared up to generating proper, 100 percent correct XML? I don’t think that’s totally nailed. You [can’t] say “Hey, I’m just going to string concatenate some stuff together and shoot this out as a Web service. And hey, it looks like a SOAP call, it smells like a SOAP call, and it should be right.” InfoWorld: Isn’t that what XDocs is going to do? Lucovsky: XDocs will do that in one mode, but XDocs is doing that in the mode of “I presented a form to a human to fill out.” But let’s say that I’m the author of Weather.com and I’m publishing a Web service that returns at 10-day weather forecast and I said that the weather forecast conforms to the schema. What coding techniques are they using to make sure that the data that they put out conforms to the schema? InfoWorld: You’re talking about validation of well-formed XML? Lucovsky: It’s more than just validation. Sometimes validation is happening at the fingertips of the guy that’s saying “I’m pulling this string from this database and then I’m going to string concatenate these angle brackets in this certain way, and I’m going to say yep, that conforms to the schema because I wrote the code and I wrote the schema and I just know that it does.” He would catch his errors right away if he was type checking that his code was correct. There’s a whole spectrum of these things. InfoWorld: So what’s the solution? Lucovsky: I think there are a lot of possible solutions there. I think that our first generation DOMs are OK for that, but you still see in code today a lot of different ways of doing these same basic operations. And there’s a lot of room for error in the code. This isn’t bad — this is actually some of the things that makes XML on the Web very popular, is that we’re not going to constrain you to “This is the only way of doing it.” [But] there should be some best practices that make it easier for you to do the right thing. There’s nothing wrong with string concatenation, I’m just pointing out that it is brittle and people do make mistakes with it. I think that [with] the next generation of tools and techniques, we just have to keep getting better and better. Our validating the XML support in .Net is incredibly good. It has validating readers so you can throw a parser, associate a schema with a document, and say “Validate it. Let me know if it’s good or bad,” and it’ll raise an exception if it’s bad XML in the DOM. Three years ago, you didn’t have that capability. So we’ve gone a long way. We didn’t even have the schemas until a few years ago, and yet XML was flowing all over the place. So it’s an interesting time. Technology Industry