Virtual Token Descriptor-XML can accelerate parsing for applications based on Web Services Security Web Services Security (WSS) refers to a set of XML message-level standards designed to ensure the security of various aspects of SOA (service-oriented architecture). Yet, due largely to the inherent issues of DOM (Document Object Model) and SAX (Simple API for XML Processing), the real-world implementations of WSS generally have poor performance characteristics that often fail to meet the requirements of production SOA deployment. With the advent of VTD-XML (Virtual Token Descriptor-XML), this is about to change fundamentally. Still, many problems with WSS are deeper than they appear, and overcoming them would inevitably require changes to the problems themselves.The objectives of this article are:To analyze the performance issues of DOM for WSS applications and look at how VTD-XML solves those issues.To introduce XMLModifier, a new feature introduced in version 1.8 of VTD-XML, and show some of the latest benchmark numbers of VTD-XML most relevant to WSS.To identify some of the technical issues in WS signing and encryption and propose possible fixes.The new clothes of WSSIf you are one of those enterprise developers spending considerable time tuning for better application performance, you probably are aware of the strategy involving the following steps: Identify the performance bottleneck in the application.Rewrite/optimize the corresponding code to eliminate the bottleneck.While this tuning strategy is usually effective, it depends on an underlying important assumption—in the second step, you must have reasonable control of the code corresponding to the performance “hot” spot. Otherwise, if the most significant bottleneck contributor lies in a well-known class library of the JDK itself, you are likely to get stuck in a quagmire.Good examples of this problem are the real-world implementations of WSS. At the conceptual level, WSS is a set of message-level specifications designed to ensure the authenticity, confidentiality and integrity of SOAP messages. A WSS endpoint takes an incoming SOAP message and computes security tokens (essentially XML fragments), which are then inserted into the original SOAP message. Unfortunately, most existing WSS implementations generally have poor performance characteristics. Certain operations of WSS, such as WS signing and encryption, even have the reputation of being deadly slow.While the computation of security tokens varies in complexity, a WSS application generally has to parse the entire SOAP message for the following two reasons: The values of the security tokens are computed from SOAP data, which can be anywhere in the SOAP message.The computed security tokens need to be inserted back into locations that can be anywhere in the SOAP message.For those reasons, SAX is, generally speaking, not well-suited for WSS implementations because SAX parsers force developers to buffer the events or create their custom object models, both of which require undue implementation effort. DOM, on the other hand, provides much-needed power and flexibility, since a DOM tree resides in memory.Unfortunately, DOM parsing is known to be memory and CPU intensive. But this is not the only problem. Inserting a security token, no matter how simple, requires that the entire incoming XML message be reserialized. But reserialization doesn’t come cheap: It involves memory copying, buffer allocation, and character decoding.Why are WSS implementations slow? DOM parsing is slow, and reserialization makes them a lot slower. Worse, both parsing and reserialization are inevitable with DOM — there doesn’t seem to be an easy way out. Facing a problem this obvious, I must ask: Do you see the same problem that I see?Change is comingMy last two JavaWorld articles focused on two key benefits of VTD-XML: high-performance parsing and incremental update. Both are quite essential for a high-performance WSS implementation. Why? First, VTD-XML parses XML messages five to 10 times faster than DOM parsers, consumes just one-third to one-fifth of the memory, and, more importantly, exports a hierarchical view of XML Information Set (Infoset) that one can navigate back, forth, and sideways. Second, VTD-XML internally keeps an XML message intact and undecoded, meaning reserializing the parts of the SOAP message irrelevant to the security token computation is no longer necessary. When the security tokens are generated, just stick them anyplace you want in the message.From a technical perspective, VTD-XML has raised the base-line WSS performance to a level that is close to VTD-XML’s parsing performance. A 3-year-old 1.7-GHz Pentium M processor gives you between 50 MB/second and 70 MB/second, roughly 10 to 15 times DOM’s throughput of doing both parsing and reserialization. In other words, now is the time to raise expectations on WSS performance. Incrementally update XML with XMLModifierBefore version 1.8, VTD-XML had three main classes that performed parsing, navigation, and XPath evaluation. The latest version of VTD-XML introduces XMLModifier, a new class that simplifies the incremental updates of XML content. It does three things:Inserts bytes or strings into an XML file.Deletes portions of XML.Updates the original XML with newer content.Sharing the basic concepts of other classes, XMLModifier operates directly at the byte level, instead of the node level, in DOM. To use XMLModifier, developers usually follow these steps:Instantiate an instance of XMLModifier: There are two constructors available: One takes an instance of VTDNav; the other is argument-less. If the second constructor is used, call bind() to attach an instance of VTDNav to XMLModifier.Record various types of modification operations: As the code navigates to different parts of the document, call XMLModifier‘s various methods to insert, delete, or update various parts of the XML document. Some of those methods take as input an integer corresponding to the VTD token index. Other methods operate on the token at VTDNav‘s cursor.Generate the new, updated XML document by calling output().If reuse of XMLModifier is necessary, call reset().Let me put XMLModifier in action and demonstrate the process outlined above using test.xml: <purchaseOrder orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name> Robert Smith </name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> </purchaseOrder> Shown below, our simple application inserts an attribute in the root element purchaseOrder, replaces the name child of shipTo from “Alice Smith” to “Janice Smith,” inserts a new element before shipTo, and deletes billTo entirely:import com.ximpleware.*; import com.ximpleware.xpath.*; import java.io.*; public class app{ public static void main(String args[]){ try{ VTDGen vg = new VTDGen(); if (vg.parseFile("test.xml",false)){ VTDNav vn = vg.getNav(); XMLModifier xm = new XMLModifier(vn); // use vn to navigate // use xm to record the changes // at the root element at the start // heed the whitespace at the beginning xm.insertAttribute(" shipDate='1-1-2000' "); vn.toElement(VTDNav.FIRST_CHILD); xm.insertBeforeElement("t<test/>n"); vn.toElement(VTDNav.FIRST_CHILD, "name"); xm.updateToken(vn.getText(),"Janice Smith"); vn.toElement(VTDNav.PARENT); vn.toElement(VTDNav.NEXT_SIBLING); xm.remove(); FileOutputStream fops = new FileOutputStream(new File("newTest.xml")); xm.output(fops); } } catch (Exception e) { } } } Here is the actual output file named newTest.xml: <purchaseOrder shipDate='1-1-2000' orderDate="1999-10-20"> <test/> <shipTo country="US"> <name>Janice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> </purchaseOrder> When output() is called, the XMLModifier instance does a few checks internally. After deleting the billTo element in the test.xml, our sample application can continue to delete its attributes or child elements. But there exists a semantic ambiguity. If the billTo element is removed, its attributes and children are all gone. What does it mean to delete the attributes, or children, of a “nonexistent” element? In this case, calling output() will throw an XMLModifyException. Note that while XMLModifier‘s methods are designed to avoid introducing errors into XML, the well-formedness of the output’s byte content is not guaranteed. For example, the current implementation of XMLModifier also forbids the calling of insertBeforeElement() or insertAfterElement() twice in a row at the same cursor location. The reason is because it is, again, ambiguous. Say you insert <test/> the first time and <test2/> the second time. What should the output look like? <test/> <test2>? Or <test2/> <test/>? If you want the output to look like the former, why not just insert <test/> <test2/> all at once? It is up to the developers to decide how to use XMLModifier‘s methods to correctly produce well-formed XML output.Benchmark resultsThe purpose of this section is to help readers get a quantitative feel of the performance characteristics of various types of essential operations in WSS applications. The first part contains benchmark numbers of the base-line performance (parsing and reserialization) for WSS. The second part goes one step further and measures combined latency of parsing, XPath evaluation, and outputting XML. The benchmark code used in this article is available as part of the VTD-XML 1.9 release, which can be downloaded from Resources.The environment for the benchmark has the following setup: Hardware: A Sony VAIO notebook featuring a 1.7-GHz Pentium M processor with integrated 2 MB of cache memory, 512-MB DDR2 RAM and 400-MHz front-side bus.OS/JVM setting: The notebook runs Windows-XP, and the test applications are obtained from version 1.506 of JDK/JVM.XML parsers: The benchmark tests Xerces DOM version 2.7.1 and VTD-XML version 1.8.5. The DOM tests are configured to use both deferred node expansion (by default) and full node expansion. The VTD-XML tests are in normal mode and in buffer-reuse mode.To reduce timing variation due to disk I/O, the benchmark programs first read XML files into the memory buffer prior to the test runs and output XML files into an in-memory byte array output stream. The server JVM is used to obtain peak performance. All input/output streams are reused whenever possible.For the first part of the benchmark, a random collection of XML files are chosen and divided into three groups according to the file size. Small files are less than 30 KB in size. Mid-sized files are between 30 KB and 1 MB. Files larger than 1 MB are considered big. The benchmark code first parses an in-memory XML file, then immediately writes it back out into a byte array output stream. The results consist of both the parsing-only performance and round-trip performance (parsing plus reserialization).For the second part of the benchmark, three XML purchase orders of similar structure, but different sizes, are chosen. The benchmark code parses an XML file in the buffer, evaluates a single precompiled XPath expression, removes the nodes from the document, and writes the output into a byte array output stream. The five chosen XPath expressions are: /*/*/*[position() mod 2 = 0]/purchaseOrder/items/item[USPrice<100]/*/*/*/quantity/text()//item/comment//item/comment/../quantityTo give you some idea about the XML file structure, below is the starting portion of the purchase order: <?xml version="1.0"?> <purchaseOrder orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name> Robert Smith </name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity></quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity><![CDATA[1]]></quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> ... </items> </purchaseOrder> Part 1. Parsing vs. parsing and reserializingEach cell of Table 1 contains two numbers: the left one is the parsing latency, the right one is the combined latency of parsing and reserialization.Table 1. Latency comparisons of parsing vs. parsing and reserialization DOM Deferred (ms)DOM Full (ms)VTD-XML (ms)VTD-XML with Buffer-reuse(ms)po_small.xml (6,780 bytes)0.547 / 1.6040.388 / 0.9340.134 / 0.1340.121 / 0.121form.xml (15,845 bytes)1.071 / 2.951 0.946 / 1.8450.234 / 0.238 0.217 / 0.223book.xml (22,996 bytes)3.711 / 11.4443.238 / 7.3120.381 / 0.3920.361 / 0.373cd.xml (30,831 bytes) 7.951 / 17.2389.082 / 12.4540.612 / 0.6400.59 / 0.616bioInfo.xml (34,759 bytes)7.811 / 14.5028.674 / 10.9040.553 / 0.5670.534 / 0.566po_medium.xml (112,238 bytes)13.095 / 28.55218.268 / 26.7662.069 / 2.1992.023 / 2.081po_big.xml (1,060,823 bytes)104.688 / 237.903144.821 / 266.94 21.956 / 26.82621.556 / 23.408blog.xml (1,334,455 bytes)68.486 / 156.33789.289 / 138.7520.517 / 22.9720.253 / 22.195soap.xml (2,716,834 bytes)313.26 / 808.86480.989 / 835.161.69 / 72.0155.58 / 61.69ORTCA.xml (8,029,319 bytes)749.88 / 1667.11056.32 / 1483.53210.41 / 235.43195.88 / 212.81address.xml (15,981,592 bytes)2790.92 / 4120.719 2217.79 / 4963.64334.18 / 379.14304.74 / 335.580 Part 2. Update performance comparisonTable 2. Combined update latency comparison for /*/*/*[position() mod 2 = 0] DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)po_small.xml (6,780 bytes) 6.6453.8270.1840.171po_medium.xml (112,238 bytes)42.24835.1253.0522.751po_big.xml (1,060,823 bytes) 324.38286.61335.80129.475 Table 3. Combined update latency comparison for /purchaseOrder/items/item[USPrice DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)po_small.xml (6,780 bytes) 6.9214.1340.2290.204po_medium.xml (112,238 bytes)42.72437.5913.0523.392po_big.xml (1,060,823 bytes) 314.585354.17541.74337.035 Table 4. Combined update latency comparison for /*/*/*/quantity/text() DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)po_small.xml (6,780 bytes) 4.7624.1130.2120.192po_medium.xml (112,238 bytes)41.91238.8483.5423.132po_big.xml (1,060,823 bytes) 376.808367.49539.99135.635 Table 5. Combined update latency comparison for //item/comment DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)po_small.xml (6,780 bytes) 4.9514.2780.3070.282po_medium.xml (112,238 bytes)47.0641.3444.8094.373po_big.xml (1,060,823 bytes) 358.783395.40156.89852.058 Table 6. Combined update latency comparison for //item/comment/../quantity DOM defered (ms)DOM full (ms)VTD-XML(ms)VTD-XML with buffer reuse (ms)po_small.xml (6,780 bytes) 4.9784.3260.3060.291po_medium.xml (112,238 bytes)46.52242.4665.1124.716po_big.xml (1,060,823 bytes) 405.683401.29353.4649.003 Figure 1. Normalized latency comparison for /*/*/*[position() mod 2 = 0]. Click on thumbnail to view full-sized image.Figure 2. Normalized latency comparison for /purchaseOrder/items/item[USPrice Figure 3. Normalized latency comparison for /*/*/*/quantity/text(). Click on thumbnail to view full-sized image.Figure 4. Normalized latency comparison for //item/comment. Click on thumbnail to view full-sized image.Figure 5. Normalized latency comparison for //item/comment/../quantity. Click on thumbnail to view full-sized image.ObservationsIt is clear that by looking at the data in Table 1, DOM’s reserialization is quite costly, particularly for small XML files for which reserialization can take nearly twice as long as parsing. Both the parsing and reserialization performance of DOM drop precipitously with Xerces’s default setting (deferred node expansion). As the file sizes increase, DOM’s reserialization cost declines but still accounts for roughly two-thirds of the total cost in a typical case. For big files, default node expansion helps parsing performance at the expense of increased reserialization performance. VTD-XML, with or without buffer reuse, consistently outperforms Xerces DOM by a single order of magnitude regardless of the file size. In some cases, VTD-XML with buffer reuse is nearly 30 times as fast as Xerces DOM with deferred node expansion. The second part of the benchmark shows that, regardless of file size, VTD-XML maintains the performance edge even after adding XPath expression evaluation in the mix, outperforming Xerces by a factor between seven and 38 times. Since both VTD-XML and DOM are random-access capable, XPath evaluation should perform roughly the same. But surprisingly, this is not the case, particularly for small XML files. For Xerces, XPath evaluation for po_small.xml takes twice as long as the combined latency of parsing and reserialization, dragging the combined throughput down to a whopping 1.7 MB/second (around one-tenth of the parsing throughput). To me, it seems that small XML files flowing through the networks are more likely to produce choking points within DOM-based WSS infrastructure — a problem for which VTD-XML should provide a reasonable solution.The issues of XML encryption and XML signingBut VTD-XML only offers a new starting point. Other design issues in the WSS family of specs only become obvious after we remove the parsing/reserialization overhead from WSS applications. One of the common complaints of XML signature and XML encryption is that they are deadly slow. By looking at the names of those two specs, you would think that they are mostly cryptography related. Not so. Strict crypto operations only account for a small percentage of CPU cycles. The lion’s share of the overhead is the result of performing parsing, reserialization, and XML canonicalization on SOAP messages. And among them, the most troubling part is XML canonicalization, which converts XML Infoset into a unique byte pattern. The original goal of XML canonicalization was to check the logical equivalence of two documents. To canonicalize an XML document, one must apply the transformation process, which consists of 13 steps, to the XML document (see “Performance of Web Services Security,” Hongbin Liu Shrideep Pallickara Geoffrey Fox).But in the context of WSS, XML canonicalization introduces too much processing overhead. Even worse, it introduces the overhead without accomplishing anything significantly necessary or useful. For one thing, signing or encrypting XML is quite different from logical equivalence checking between two XML documents. For another, the values of XML signature and cipher, like signing and encrypting any other data types, should always have been computed from the byte content of the XML itself, not the Infoset. And don’t forget that the technology world, along with its underlying assumptions, is relentlessly marching forward. The speed of networks has gone from 10 Mbits/second a decade ago to 10 Gbits/second nowadays, with 100 Gbits/second on the horizon. Given that XML/SOAP data are rapidly increasing, due to the proliferation of SOA and Web services applications, what is the point of introducing artificial choking points in the network? The misery seems entirely self-imposed.The challenge is for someone to go back to the drawing board and come up with a replacement for XML canonicalization—one that can be completely described in one to two pages and understood by anyone in five minutes. Performance-wise, it should strive not to screw up. Matching at least the parsing performance should not be that difficult. Among those 13 steps described in XML canonicalization, it seems that only the transcoding (to UTF-8) step should be retained.ConclusionThis article investigated some of the practical implementation issues in a DOM-based WSS infrastructure. As the next-generation XML parser beyond DOM and SAX, VTD-XML fundamentally and completely solves DOM’s wasteful parsing and reserialization. But more challenges are ahead. The XML canonicalization spec is unnecessarily complex and inefficient, making it ill-suited for a reasonably high-performance implementation of XML signature and encryption. We need something much simpler and faster.Jimmy Zhang is founder of XimpleWare, a provider of high-performance XML-processing solutions. He has experience in the fields of electronic design automation and voice-over IP with numerous Silicon Valley technology companies. He graduated from University of California, Berkeley with both an MS and a BS from the department of EECS. Web DevelopmentBuild AutomationSecurityJava