by Samu Paajanen

Implement complicated data transformations with SAX and XSLT

news
Sep 5, 200529 mins

Standard Java API provides powerful tools for XML data transformations

I was once asked to help in a project that required a simple data transformer for converting raw bill data into different bill layouts. After receiving a brief introduction to the problem, I suggested using XSLT (Extensible Stylesheet Language Transformations).

When I dug deeper into the requirements, it turned out the problem was not as simple as I had first thought. The input data was manageable, but the data needed to perform the transformation simply could not be depicted with a set of static XSLT stylesheets. Part of the transformation data was dynamic and stored in two separate databases. In addition, to produce the bill layouts, the program had to perform relatively complex calculations on the input data using numbers fetched from the two databases. The XSLT solution was quietly forgotten.

The core problem in this case was the data needed for directing the transformation—it was dynamic. In a perfect world, you would never face this issue. Preparation of the input data and the transformation should be clearly separated, so all the information needed for the transformation could easily be included in a single XSLT template. Unfortunately, we don’t live in a perfect world, and the requirements of real-life projects are sometimes quite bizarre.

This article suggests one solution to the problem described above. I show by example how the power of SAX (Simple API for XML) can be harnessed to enhance the applicability of XSLT. In addition, I show how XSLT can be used even if neither the input data nor the desired output is XML.

Introduction to XSLT

XSLT is a programming language for transforming XML data. XSLT stylesheets can be applied to transform an XML document into another XML format or practically any other format. While XSLT may not be a simple language to learn—especially to those more familiar with Java-like languages—it is a powerful and flexible way to accomplish relatively complicated data transformations. If you are not familiar with XSLT, plenty of excellent tutorials are available. See, for instance, Chapter 17 of the XML Bible.

Though XSLT is a great language, some tasks are difficult, or nearly impossible, to accomplish with it. Transformations where you must calculate the combinations of data fields taken from several elements of the input XML are usually possible, but often extremely difficult to write. If the data directing the transformation is itself dynamic, XSLT alone is not enough. XSLT templates are static in nature, and, while it may be possible to dynamically regenerate the templates, I can’t imagine a situation when this would be feasible. (If you have a different opinion, feel free to send me feedback.)

After experimenting with various ideas, I concluded that the easiest way to accomplish complicated transformations using XSLT was to manipulate the input XML before feeding it to the XSLT transformer. This may sound ridiculously complicated and inefficient, but it turns out that with SAX manipulating the XML data on the fly, it is quite easy.

SAX is an event-driven interface for parsing an XML document. When the SAX parser parses XML data, it generates “callback” notifications about the XML elements that the parser recognizes. For instance, when the parser encounters the XML start tag, it produces a callback event startElement. The name of the tag and other relevant information are sent in the parameters of the callback call. SAX should be used when efficient XML parsing is needed. For more information about SAX, see Sun’s tutorial on JAXP. In this article, I use SAX to modify the flow of events before forwarding them to the XSLT transformer.

SAX and XSLT are both included in the JAXP (Java API for XML Processing) API, which has been a part of J2SE since version 1.4.

Overview of the examples

This section introduces the examples included with this article and introduces you to the possibilities of SAX and XSLT.

Running the examples

If you are not interested in running the examples, skip this section. This article, however, relies heavily on the code examples, so I advise you to at least look at the code. The examples have been tested with J2SE 1.4.2 on the Windows environment. No other packages are needed to run the applications. The instructions, however, assume you use the Ant build tool. If you don’t want to use Ant, you can still build and run the examples, but that requires a bit more work.

Detailed instructions are in the README.txt file included in the zip file downloadable from Resources. Once you have unzipped the package and set the relevant environment variables (explained in README.txt), you can use the following Ant commands:

  • ant build: Erases the files created in the build and compiles the whole source code again
  • ant clean: Erases the files created in the build
  • ant example1: Runs Example 1
  • ant example2: Runs Example 2
  • ant example2b: Runs Example 2b (a variation of Example 2)
  • ant example3: Runs Example 3
  • ant example4: Runs Example 4

Overview of Example 1

Though Example 1 is a basic XSLT transformer, introducing it proves necessary because it represents the basis on which the following examples are built. Figure 1 shows Example 1’s conceptual picture.

The input data (1.1) is proprietary XML data, which models the report of the customers’ orders. The data looks like this:

 

<?xml version="1.0"?>

<ORDER_INFO> <CUSTOMER GROUP="exclusive"> <ID>234</ID> <SERVICE_ORDERS>

<ORDER> <PRODUCT_ID>1231</PRODUCT_ID> <PRICE>100</PRICE> <TIMESTAMP>2004-06-05:14:40:05</TIMESTAMP> </ORDER> <ORDER> <PRODUCT_ID>2001</PRODUCT_ID> <PRICE>20</PRICE> <TIMESTAMP>2004-06-12:15:00:44</TIMESTAMP> </ORDER> </SERVICE_ORDERS> </CUSTOMER>...

Example 1’s complete input data is in file /input/orderInfo_1.1.xml. From now on, refers to the directory to which you have unzipped this article’s examples.

Figure 1’s XSLT template (1.2) is a regular XSLT stylesheet that transforms the input data into HTML. The XSLT template looks like this:

 

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output omit-xml-declaration="yes"/>

<xsl:template match="/"> <xsl:apply-templates select="ORDER_INFO"/> </xsl:template>

<xsl:template match="ORDER_INFO"> <HTML> <HEAD> <TITLE>Customers&apos; Order information</TITLE> </HEAD> <BODY> <H1>Customers&apos; Order information</H1> <xsl:apply-templates select="CUSTOMER"/> <xsl:apply-templates select="PRICE_SUMMARY"/> </BODY>

</HTML> </xsl:template>...

The complete transformation template is in file /template/transform_1.2.xml. When you run Example 1, the program writes the output file /output/result_1.3.html.

To readers familiar with XSLT, Example 1 should look like a nice and easy programming exercise. The XSLT template only includes data required for formatting the output and does not include any complicated calculations. Let’s go to Example 2 for a more challenging case.

Overview of Example 2

Example 2 may be this article’s most interesting sample. The example actually consists of two different transformations (Transformation 2a and Transformation 2b) that we will consider separately, starting with Transformation 2a. Example 2’s conceptual picture is shown in Figure 2.

The input data (1.1) and the XSLT template (1.2) match those in Example 1. Before the XSLT transformation is applied, the input data goes through the preprocessor, which is actually a set of Java classes that manipulate the XML data using SAX’s event-filtering feature. The datasource is a set of classes that implement a sort of dummy datasource. In the real application, this could be a database interface, for example. This dummy database is included to show a simplistic pattern for enriching the XML data with the dynamic data fetched from an external datasource. I wanted to make the examples as simple as possible to install and run, so I did not implement any real database connections—the dummy implementation hopefully gives you the idea.

When you run the examples, the program (when the mode parameter is set to debug) echoes the XML data coming from the preprocessor to the standard output stream (System.out). This data resembles the preprocessor’s output data, which is now the new input data to the transformer. Echoing the preprocessor output is a handy way to debug the transformations completed during preprocessing. We’ll consider this feature’s implementation later in this article.

When you run Transformation 2a, the following data is echoed to the screen:

 <ORDER_INFO>
   <CUSTOMER GROUP="exclusive">
      <ID>
         Jill
      </ID>
      <SERVICE_ORDERS>
         <ORDER>
            <PRODUCT_ID>
               Doohickey
            </PRODUCT_ID>
            <PRICE>
               100
            </PRICE>
         </ORDER>...

If you compare this data to the original input data, (/input/orderInfo_1.1.xml), you’ll notice the following differences in the data’s beginning:

  • The value of the first CUSTOMER/ID element has changed from 234 to Jill.
  • The value of the first CUSTOMER/SERVICE_ORDERS/ORDER/PRODUCT_ID element has changed from 1231 to Doohickey.
  • The TIMESTAMP element has been removed.

The preprocessor has replaced the values of CUSTOMER/ID and CUSTOMER/SERVICE_ORDERS/ORDER/PRODUCT_ID with the values received from its internal mock database. It has also filtered out the TIMESTAMP elements and their values. This modified data now represents the input to the XSLT transformer.

Transformation 2a’s output file is /output/result_2.1.html.

When you run Transformation 2b, the data echoed to the screen may at first appear similar to that of Transformation 2a. The difference is the PRICE_SUMMARY element at the end, inserted by the preprocessor:

  <PRICE_SUMMARY>
      <PRODUCT>
         <NAME>
            Doohickey
         </NAME>
         <SUM>
            110
         </SUM>
      </PRODUCT>
      <PRODUCT>
         <NAME>
            Nose Cleaner
         </NAME>
         <SUM>
            10
         </SUM>
      </PRODUCT>
      <PRODUCT>
         <NAME>
            Raccoon
         </NAME>
         <SUM>
            40
         </SUM>
      </PRODUCT>
   </PRICE_SUMMARY>
</ORDER_INFO>

The purpose of this example is to demonstrate that the preprocessor can also be applied to include new XML elements, values of which may be calculated directly from the input XML—or by using some supplementary data from an external source.

The output file of Example 2b is /output/result_2.1b.html.

Why are these examples interesting? At the conceptual level, this data transformation approach does not seem groundbreaking. The interesting point is that small, dynamic enhancements to the input XML are relatively easy to implement with SAX, but they enable transformations, which are impossible with plain XSLT. On the other hand, implementing the whole transformation using only SAX would be possible, but tedious. Using SAX and XSLT together opens almost endless possibilities in implementing complicated data transformations.

I discuss the pattern for extending the XSLT transformer with the preprocessor in the section entitled “A Deeper Look into Example 2.” You’ll see that, with minor changes, this article’s code examples can be applied to many different purposes.

Examples 3 and 4 enhance Example 2 with capabilities for reading and producing non-XML data. If you are not interested in these options, jump directly to the implementation details of Examples 1 and 2.

Overview of Example 3

Sometimes, input data comes from several sources and, sometimes, not always in XML format. Example 3 shows how SAX can be used to generate events even from non-XML data, thus making it possible to apply XSLT. Example 3’s conceptual picture is shown in Figure 3.

Example 3’s input data (3.1) looks like this:

 3
exclusive:234
2
Order:1231
Price:100
Timestamp:2004-06-05:14:40:05
Order:2001
Price:20
Timestamp:2004-06-12:15:00:44...

Example 3’s complete input data is in file /input/orderInfoAsText_3.1.txt.

The XML generator reads this data and produces SAX events that exactly match the input XML document of the previous examples (Examples 1 and 2). The data received by the preprocessor is thus similar to the input data of Example 2’s preprocessor. The rest of the transformation resembles Transformation 2b. The resulting output file (/input/result_3.2.html) is also similar to Transformation 2b’s output (file /output/result_2.1b.html).

Overview of Example 4

XML is not always the desired output format. This example resembles the previous one; however, it produces a text file instead of an XML file. See Figure 4 for a conceptual view.

The input data is the same as in Example 3 (/input/orderInfoAsText_3.1.txt). The XSLT template, however, differs. As readers already familiar with XSLT know, XSLT transformations can also be used to produce non-XML data. The XSLT template (4.1) is a regular XSLT stylesheet that transforms the input data into text format. The template looks like this:

 

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text" omit-xml-declaration="yes"/>

<xsl:template match="/"> <xsl:apply-templates select="ORDER_INFO"/> </xsl:template>

<xsl:template match="ORDER_INFO"> Customers&apos; Order information <xsl:apply-templates select="CUSTOMER"/> <xsl:apply-templates select="PRICE_SUMMARY"/> </xsl:template>

<xsl:template match="CUSTOMER"> Customer id:<xsl:value-of select="ID"/> Customer group is &apos;<xsl:value-of select="@GROUP"/>&apos; <xsl:apply-templates select="SERVICE_ORDERS"/>

</xsl:template>...

The complete transformation template is in file /template/transform_4.1.xml. When you run Example 4, the program writes the output file /output/result_4.2.txt.

Example 4’s Java code is the same as Example 3’s, with the only difference being the XSLT template.

A look into the code

Let’s look at how the examples are implemented.

A deeper look into Example 1

Example 1’s Transformer entity consists of classes Example1 and ExampleTester, and interface Example. Class ExampleTester just parses the input parameters, creates an instance of the class implementing interface Example (in this case, the Example1 class), and invokes the doTransform() method of the created Example object. Figure 5 shows a class diagram of Example 1. The classes in the green boxes model the classes of the standard J2SE library. For instance, StreamSource and StreamResult belong to the standard package javax.xml.transform.stream.

The doTransform() method in Example1 is interesting. Let’s take a closer look:

 1.  public doTransform(String _inputFileName, 
          String _transformerFileName, 
          String _outputFileName) {
2.     try {
3.        initTransformer(_transformerFileName);
4.     } catch (TransformerConfigurationException tce) {
5.         // omitted for clarity
6.     } 
7.     Source myXMLSource = getInputSourceObject(_inputFileName);
8.     Result myResult = getResultObject(_outputFileName)
9.     try {      
10.       myTransformer.transform(myXMLSource, myResult);  
11.    } catch (TransformerException te) {
12.        // omitted for clarity  
13.    }
14. }

The doTransform() method has three arguments:

  • _inputFileName: The name of the source file from which the original XML data is read
  • _transformerFileName: The name of the XSLT file
  • _outputFileName: The name of the result file to which the transformed XML data is written

In Line 3, the initTransformer() method is called with parameter _transformerFileName to create the XSLT transformer. The myXMLSource class is the input source of the data to be transformed. In Line 7, the getInputSource() method is called with parameter _inputFileName to create the myXMLSource instance. The myResult object is the target to which the result (the transformed data) is sent. In Line 8, getResultObject() is called with parameter _outputFileName to create the file target. After these preparations, the actual transformation is done with one line of code in Line 10:

 myTransformer.transform(myXMLSource, myResult);

That’s basically it. Before going to the second example, let’s consider the Source instance created by getInputSourceObject():

 Source myXMLSource = getInputSourceObject(_inputFileName);

This is the XSLT transformer’s datasource. Three classes implement Source: StreamSource, SAXSource, and DOMSource. In this article, only StreamSource and SAXSource are needed (see Figure 6).

In Example 1, method getInputSourceObject() returns an instance of StreamSource. The real source of the data is obviously the input file (/input/orderInfo_1.1.xml) to which the stream source is opened.

In the following examples, the input XML still comes from the file, but instead of a simple data stream, it is handled as stream of SAX events. The fact that the transformer receives the input as a stream of events instead of a simple file stream would not matter much in a simple case like Example 1. In the next example, however, the whole idea is based on the ability to catch and manipulate the stream of SAX events.

A deeper look into Example 2

The implementation of Example 2’s preprocessor (in Transformation 2a) consists of class Example2 and a set of other classes in package myutil.

Class Example2 extends class Example1. The purpose of this class is to override method getInputSourceObject() and define the command factory’s name. Recall from the previous example that getInputSourceObject() returns the input data’s source. The functioning of the command factory will be described later. Let’s first investigate the getInputSourceObject() method:

 1.  protected Source getInputSourceObject(String _pathName) throws FileNotFoundException  {
2.     InputSource inputSource = getInputSourceFromFile(_pathName);
3.     XMLReader xmlFilter = getFilteringReader();
4.     SAXSource saxSource = new SAXSource(xmlFilter, inputSource);
5.     return saxSource;
6.  }

In Line 2, an InputSource instance is created using method getInputSourceFromFile(). This method creates an InputSource from a regular file input stream. An instance of XMLReader is created in Line 3 using getFilteringReader. Finally, in Line 4, a SAXSource is created from the InputSource and XMLReader instances. The creation of the XMLReader is the interesting part, so let’s look into the getFilteringReader method:

 1.  private XMLReader getFilteringReader() { 
2.     XMLReader myReader = getReader();
3.     XMLFilterImpl xmlFilter = new ModifyingXMLSource(myReader, FACTORY_NAME);
4.     if ((MODE != null) && (MODE.equals("debug"))) {
5.        xmlFilter = new XMLPrinter(xmlFilter);
6.     }
7.     return xmlFilter;
8.  }

In Line 2, XMLReader is created with no parameters. This is the default reader that parses the raw data from the input source and sends the data as SAX events to the content handler, which, in this case, would be the Transformer object doing the XSLT transformation. In the next line (Line 3), we create an instance of ModifyingXMLSource, which is our own implementation of a SAX event filter. This class is really the core of this article. Notice that XMLReader‘s default implementation is given as a parameter to ModifyingXMLSource‘s constructor. In Line 5, a second event filter, an instance of XMLPrinter, is created. Figure 7 shows the resulting object structure.

Our SAXSource now works so that the SAX events first go to the ModifyingXMLSource instance, which, if it so decides, forwards the events to the next filter in the chain, the XMLPrinter instance. As you will see later, ModifyingXMLSource may also generate and forward new events. The XMLPrinter “filter” works so that it forwards all the events to the final destination: the XSLT transformer. The class diagram in Figure 8 shows the static class structure.

To clarify the functioning of the filters, let’s investigate the XMLPrinter first because it is the simpler of the two. The XMLPrinter class just echoes the XML data’s structure to the standard output stream and forwards the SAX events to its parent class. If you ran the examples according to the previous instructions, you have already seen how this works. The structure of the XMLPrinter class follows:

 

public class XMLPrinter extends XMLFilterImpl { private CharArrayWriter contents = new CharArrayWriter();

private String indent = "";

public XMLPrinter(XMLReader _reader) { super(_reader); };

public void startElement(String _uri, String _localName, String _qName, Attributes _atts) throws SAXException { // Print the indentation // Print start tag // Increase the indent super.startElement(_uri, _localName, _qName, _atts); }

public void characters(char[] _ch, int _start, int _length) throws SAXException { contents.write(_ch, _start, _length); super.characters(_ch, _start, _length); };

public void endElement(String _uri, String _localName, String _qName) throws SAXException { // Decrease the indent // Print the indentation // Print the contents of the element and the end tag super.endElement(_uri, _localName, _qName); } }

The XMLPrinter class extends class XMLFilterImpl, which provides the default implementations to all SAX events’ callback methods. If you don’t override any methods, the stream of events goes through this filter unchanged. XMLPrinter includes methods startElement(), characters(), and endElement(), which are needed for printing the XML document’s essential contents to the standard output stream. startElement() is always called when the opening tag of an XML element is encountered (for example, <TIMESTAMP>). The characters() method is always called when the XML element’s contents are encountered (for the TIMESTAMP element, this content would be 2004-06-05:14:40:05). The endElement() is called when an XML element’s end tag is encountered.

In XMLPrinter, for instance, the printable XML start tags are constructed and printed in method startElement(). Notice that the superclass’s corresponding method is always called at the end of each method. Thus, the whole event stream remains unchanged.

Our other event filter, ModifyingXMLSource, like XMLPrinter, overrides methods startElement(), characters(), and endElement(). This class handles the events so that every time it receives a SAX event (or more specifically, a call to one of the callback methods startElement(), characters(), or endElement()), it looks for the Command object to handle that event. The idea is to keep the ModifyingXMLSource class lean and simple, and implement most of the logic in the separate command classes.

At this point, let’s look at how the commands map into the XML elements. The command-to-element mapping is done in the factory classes. When ModifyingXMLSource is created, it initializes one of the command factories. In Example 2 (Transformation 2a), this factory is the Example2CommandFactory class. This class’s constructor follows:

 1.  public Example2CommandFactory() {
2.     CustomerIdCommand myCustomerIdCommand = new CustomerIdCommand();
3.     commands.put("/ORDER_INFO/CUSTOMER/ID", myCustomerIdCommand);
4.     FilterCommand myFilterCommand = new FilterCommand();
5.     commands.put(
          "/ORDER_INFO/CUSTOMER/SERVICE_ORDERS/ORDER/TIMESTAMP",
          myFilterCommand);
6.     ProductIdCommand myProductIdCommand = new ProductIdCommand();
7.     commands.put(
          "/ORDER_INFO/CUSTOMER/SERVICE_ORDERS/ORDER/PRODUCT_ID",
          myProductIdCommand);
8.  }

In Line 3, the ID element inside the CUSTOMER element maps to the command of type CustomerIdCommand. Line 5 maps the TIMESTAMP element to the FilterCommand command, and Line 7 maps the PRODUCT_ID element to the ProductIdCommand command. All the command classes implement the common Command interface, which looks like this:

 

1. public interface Command { 2. public void reset(); 3. public Object getResult(); 4. public void startElement(String _uri, String _localName, String _qName, Attributes _atts, XMLFilterImpl _caller, DefaultHandlerInterface _default) throws SAXException; 5. public void characters(char[] ch,

int start, int length, XMLFilterImpl _caller, DefaultHandlerInterface _default) throws SAXException; 6. public void endElement(String _uri, String _localName, String _qName, XMLFilterImpl _caller, DefaultHandlerInterface _default) throws SAXException; 7. }

The Command interface has five method signatures. Let’s cover them briefly. Commands can be used for collecting and combining data, so the ability to reset the command’s state is obviously a requirement. The reset() method (Line 2) is for this purpose. The getResult() method (Line 3) is meant to be used by other commands to exchange information. Transformation 2b (which will be explained later) uses the getResult() method. Methods startElement(), characters(), and endElement() (Lines 4, 5, and 6) are basically copies of the corresponding methods in class ModifyingXMLSource. The ModifyingXMLSource can easily delegate SAX event processing to the command classes using these methods.

The ModifyingXMLSource gets the commands via the factory objects, as shown in its startElement() method:

 1.  public void startElement(String _uri,
          String _localName,
          String _qName,
          Attributes _atts) throws SAXException {
2.     tagIdentifier += ("/" + _localName);
3.     try {
4.        currentCommand = factory.getCommand(tagIdentifier);
5.        currentCommand.startElement(_uri, _localName, _qName, _atts,
          this, this);
6.     } catch (SAXException sax) {...

The characters() and endElement() methods follow the same principle. In Line 2, tagIdentifier() tracks the element in the XML event stream. This tag identifier is used to get the right command from the factory (Line 4). What happens if the command factory does not find any command for the element? That means no element-to-command mapping is in the factory. In this case, the factory returns a NullCommand instance, which processes all the events by sending them back to ModifyingXMLSource‘s default event handlers. The sending of the events is implemented so the NullCommand just calls ModifyingXMLSource‘s corresponding default methods: defaultStartElementHandler(), defaultCharactersHandler(), and defaultEndElementHandler(). These methods forward the events unchanged to the default content handler, so the XML content remains unchanged. Simply put, NullCommand does not change anything in the XML data.

As shown earlier, three commands are mapped in Example2CommandFactory: CustomerIdCommand, FilterCommand, and ProductIdCommand. Let’s first examine the FilterCommand class, which is the simplest one. When the tagIdentifier (Line 2 in the previous code fragment) holds a value /ORDER_INFO/CUSTOMER/SERVICE_ORDERS/ORDER/TIMESTAMP, the factory object returns an instance of FilterCommand. (Recall that this string was mapped to the FilterCommand in Example2CommandFactory‘s constructor.) FilterCommand‘s startElement(), characters(), and endElement() methods do absolutely nothing. Since they don’t even call the default event handlers, as the NullCommand does, the <TIMESTAMP> element along with its content is filtered out.

Before going into the details of the two remaining commands, CustomerIdCommand and ProductIdCommand, consider the class structure shown in Figure 9 (notice that CustomerIdCommand is omitted from the picture).

The CustomerIdCommand and ProductIdCommand commands fetch data from the datasource (see Figure 2), which consists of classes in package myutil.dataAccess. In Example2CommandFactory‘s constructor, the ProductIdCommand maps to XML element /ORDER_INFO/CUSTOMER/SERVICE_ORDERS/ORDER/PRODUCT_ID. Thus, the name of the product (if the name was found) replaces the numeric product identifier. Recall that in Transformation 2a, identifier 1231 changed to Doohickey. The following code fragment from ProductIdCommand shows how this is done:

 1.  public void characters(char[] ch,
          int start,
          int length,
          XMLFilterImpl _caller,
          DefaultHandlerInterface _default) throws SAXException {
2.     contents.write(ch, start, length);
3.  };
4.  public void endElement(String _uri,
          String _localName,
          String _qName,
          XMLFilterImpl _caller,
          DefaultHandlerInterface _default) throws SAXException {
5.     idString = contents.toString();
6.     DataAccessorFactory myFactory =
          DataAccessorFactory.getFactory();
7.     DataAccessor myAccessor = myFactory.getDataAccessor();
8.     idString = myAccessor.getProductName(idString);      
9.     _default.defaultCharactersHandler(idString.toCharArray(), 0,
         idString.length());
10.    _default.defaultEndElementHandler(_uri, _localName, _qName);
11. }; 

The trick is that the characters() method does not call the defaultCharactersHandler immediately, so the original XML content is not forwarded. In the endElement() method, both the defaultCharactersHandler() and defaultElementHandler() methods are called. defaultCharactersHandler() is called with a modified value received from DataAccessor. A DataAccessor handle is acquired from the DataAccessFactory class. The DataAccessor interface looks like this:

 1.  package myutil.dataAccess;
   
2.  public interface DataAccessor {
3.     public String getCustomerName(String _customerId);
4.     public String getProductName(String _productId);
5.  }

The interface includes method getProductName(), which returns the product’s name when given the identifier as a parameter. In our example, this parameter was string 1231.

The DataAccessorFactory actually returns a handle to the dummy data accessor, called DummyDataAccessor. If you look into the code, you’ll see that this class just returns the hard-coded values. That’s okay for this example. In real-life applications you can use the same pattern—just make the factory class return a real data-access class, for instance, a class that queries a SQL database.

Our final command, CustomerIdCommand, works similarly to ProductIdCommand, except that it calls getCustomerName() instead of getProductName().

The second part of Example 2 is Transformation 2b, which, as you may recall, adds the price summary information (the PRICE_SUMMARY element) to the XML data. The main class Example2b is listed in its complete form below:

 1.  public class Example2b extends Example2 {
2.     public Example2b() {
3.        super();
4.        FACTORY_NAME = "Example2b";
5.     }
6.  }

As you can see, the beauty of this design is that the only extension needed to class Example2 is the command factory’s new name.

The factory name Example2b maps to the instance of Example2bCommandFactory in CommandFactory‘s getInstance() method. Example2bCommandFactory resembles the Example2CommandFactory used in Transformation 2a. The difference is that the Example2bCommandFactory‘s constructor includes the following new lines:

 1.  PriceCollectorCommand myPriceCollectorCommand = 
       new PriceCollectorCommand(myProductIdCommand);
2.  commands.put("/ORDER_INFO/CUSTOMER/SERVICE_ORDERS/ORDER/PRICE",
       myPriceCollectorCommand);
3.  commands.put("/ORDER_INFO", 
       new PriceSummaryPrintingCommand(myPriceCollectorCommand));

The PriceCollectorCommand collects the prices from all the PRICE elements of the input XML. PriceSummaryPrintingCommand requests the collected price information from the PriceCollectorCommand and prints this information in XML format. This is why PriceSummaryPrintingCommand receives a pointer to the PriceCollectorCommand in its constructor (Line 3). Figure 10 shows Example 2b’s class diagram.

Notice that only the new Command classes are shown in Figure 10—in addition to those shown, the Example2bCommandFactory class has all the same Command dependencies as the Example2CommandFactory class.

Let’s look deeper into one of the Command classes. The interesting part of PriceCollectorCommand appears in the following code fragment:

 1.  public void endElement(String _uri,
          String _localName,
          String _qName,
          XMLFilterImpl _caller,
          DefaultHandlerInterface _default) throws SAXException {
2.     String productIdAsKey = (String)myProductIdCommand.getResult();
3.     String priceAsString = contents.toString();
4.     Integer price = priceToNumber(priceAsString);
5.     if (price != null) {
6.       addPriceToHashMap(productIdAsKey, price);
7.     }        
8.     _default.defaultEndElementHandler(_uri, _localName, _qName);
9.  };

In the endElement() method, the addPriceToHashMap() method is called with the product identifier and the price value as its parameters. This method basically stores the total price sums per product. For the details, see PriceCollectorCommand.java‘s source code.

As in the PriceCollectorCommand, the interesting part of PriceSummaryPrintingCommand is the endElement() method:

 1.  public void endElement(String _uri,
          String _localName,
          String _qName,
          XMLFilterImpl _caller,
          DefaultHandlerInterface _default) throws SAXException {
2.     insertSummary(_default, _uri);     
3.     _default.defaultEndElementHandler(_uri, _localName, _qName);
4.  };

In the insertSummary() method, the PriceSummaryPrintingCommand inserts its own XML content before it calls the default end element event handler. insertSummary() receives the price summary information from the PriceCollectorCommand via the Command interface’s getResult() method. Then it just emits the SAX events corresponding to the hashmap’s contents. The Command class uses ModifyingXMLSource‘s default methods (defaultstartElementHandler(), defaultCharactersHandler(), and defaultEndElementHandler()) to create new SAX events. For the details, see the source code of PriceSummaryPrintingCommand.java.

A deeper look into Example 3

Since it is possible to generate new XML elements, it is not a strange idea to virtually generate the whole XML document from non-XML data. Example 3 does just that. The trick is to implement our own XML reader (see Figure 11).

Figure 11. Class diagram of Example 3

The source code of the Example3 class is very simple:

 

1. import org.xml.sax.XMLReader; 2. import org.xml.sax.SAXException; 3. import org.xml.sax.helpers.XMLReaderFactory;

4. public class Example3 extends Example2 { 5. public Example3() { 6. super(); 7. FACTORY_NAME = "Example2b"; 8. }

9. protected XMLReader getReader() { 10. XMLReader myReader = new Example3Reader(); 11. return myReader; 12. } 13. }

The overwritten getReader() method is the only new part of this class. The previous examples used the default XMLReader created with the XMLReaderFactory.createXMLReader() method call. Because the input data is not XML, we need to build our own custom reader, which is created in Line 10.

The Example3Reader must provide implementations to all methods defined in XMLReader. Most of these methods can, however, be just dummy implementations. The methods interesting to us are parse(), setContentHandler(), and getContentHandler(). Class Example3Reader includes a member class ContentHandler called myHandler. The setContentHandler() sets this handler:

 1.  public void setContentHandler(ContentHandler _handler) {
2.     myHandler = _handler;
3.  } 

The ContentHandler provides the callback methods for generating the SAX events. The Transformer calls the setContentHandler() method right before the XSLT transformation starts. In our example, this occurs in Example1‘s doTransform() method, more specifically, where Transformer‘s transform() method is called.

After setting ContentHandler, the Transformer calls Example3Reader‘s parse() method. This method implements our own parser for parsing our proprietary text format. The principle of parsing is that the XML document is constructed by calling the ContentHandler methods, which correspond to the desired XML content. In our simple example, only the following five callback methods are needed:

  • startDocument(): Start of the XML document
  • startElement(String, String, String, Attributes): Start of the XML element
  • endElement(String, String, String): End of the XML element
  • characters(char[], int, int): Contents of the XML element
  • endDocument(): End of the XML document

For instance, the following calls:

 myHandler.startElement("", "<PRICE>", "<PRICE>", new AttributesImpl());
// Converting String "20" to char array 
char[] myChArray = new char[255];
"20".getChars(0, 2, myChArray, 0);
// Conversion done
myHandler.characters(myChArray, 0, 2);
myHandler.endElement("", "<PRICE>", "<PRICE>");

correspond to an XML element:

 <PRICE>20</PRICE>

If you are interested in the parser’s details, please see Example3Reader‘s source code.

Summary

This article showed how to make complicated data transformations by using SAX to preprocess the XML data before the XSLT transformation. Example 1 introduced a basic XSLT transformer. Example 2 showed how to manipulate the XML data before feeding it to the XSLT transformer. Example 3 showed how to generate XML from non-XML data to apply XSLT. And Example 4 showed how to generate non-XML data with XSLT.

The combination of SAX and XSLT is a powerful mechanism for completing complicated data transformations. It also allows you to change the transformation rules dynamically. On the other hand, this technique can be abused. It is easy to hide business logic in the transformation engine even in cases when it should not be there. For instance, it is rarely advisable to include any business logic in the integration layer.

While all the transformation logic is no longer in the XSLT template, the maintenance becomes difficult. One could code the command classes so their data content is read from the property files, thus avoiding recompilation every time the transformation rules change. Still, the best way to keep things in control is to document what the preprocessor does and keep this documentation up to date.

Samu Paajanen has been working with Java since 1997 and with XML technologies since 2000. He has experienced many different Java/J2EE projects and has extensive knowledge of different Java tools and technologies. Paajanen holds a master’s of science in computer science from the University of Helsinki. When he’s not working, which is not too often, he likes to spend time with his one-year-old daughter. Paajanen works as a consultant at Capgemini Finland.