by Laurent Bovet

XML merging made easy

how-to
Jul 10, 200714 mins

Manipulate XML files using XPath declarations

Sometimes it seems you spend more time manipulating XML files than you do writing Java code, so it makes sense to have one or two XML wranglers in your toolbox. In this article, Laurent Bovet gets you started with XmlMerge, an open source tool that lets you use XPath declarations to merge and manipulate XML data from different sources.

As a Java developer you use XML every day in your build scripts, deployment descriptors, configuration files, object-relational mapping files and more. Creating all these XML files can be tedious, but it’s not especially challenging. Manipulating or merging the data contained in such disparate files, however, can be difficult and time-consuming. You might prefer to use several files split into different modules, but find yourself limited to one large file because that is the only format the XML’s intended consumer can understand. You might want to override particular elements in a large file, but find yourself replicating the file’s entire contents instead. Maybe you just lack the time to create the XSL transformations (XSLT) that would make it easier to manipulate XML elements in your documents. Whatever the case, it seems nothing is ever as easy as it should be when it comes to merging the elements in your XML files.

In this article, I present an open source tool I created to resolve many of the common problems associated with merging and manipulating data from different XML documents. EL4J XmlMerge is a Java library under the LGPL license that makes it easier to merge elements from different XML sources. While XmlMerge is part of the EL4J framework, you can use it independently of EL4J. All you need to run the XmlMerge utility from your command line is JDK 1.5 or greater.

In the discussion that follows, you will learn how to use XmlMerge for a variety of common XML merging scenarios, including merging two XML files, merging XML file data from different sources to create a Spring Resource bean at runtime and combining XmlMerge and Ant to create an automated deployment descriptor at build time. I’ll also show you how to use XPath declarations and built-in actions and matchers to specify the treatment of specific elements during an XML merge. I’ll conclude with a look at XmlMerge’s simple merging algorithm and suggest ways it could be extended for more specialized XML merging operations.

You can Download XmlMerge now if you want to follow along with the examples.

Merging XML files

In Listing 1 you see the very common (and greatly simplified) example of two XML files that need to be merged.

Listing 1. Two XML files that need to be merged

File1.xmlFile2.xml
<root>
<a>
 <b/>
</a>
</root>
<root>
<a>
 <c/>
</a>
<d/>
</root>

Listing 2 shows the command-line input to merge these two files using the XmlMerge utility, followed by the resulting output.

Listing 2. The two XML files merged using XmlMerge

~ $ java -jar xmlmerge-full.jar file1.xml file2.xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>
 <b />
 <c />
</a>
<d />
</root>
~ $

This first example of merging is very simple, but you may have noticed that the order in which the files are merged is important. If you switch the order, you can get different results. (Later in the article you’ll see an example of what happens when you switch the order of two files to be merged.) To keep files in order, XmlMerge uses the term original for the first document and patch for the second one. This is easy to remember because the patch document always is merged into the original.

Merging XML files from different sources

You can implement the XmlMerge utility anywhere in your Java code and use it to merge data from different sources into a new, useful document. In Listing 3, I’ve used it to merge a file from my application filesystem and the contents of a servlet request into a single document object model (DOM).

Listing 3. Merging client and server XML into a DOM

XmlMerge xmlMerge = new DefaultXmlMerge();
org.w3c.dom.Document doc = documentBuilder.parse(
                             xmlMerge.merge(
                                new FileInputStream("file1.xml"),
                                servletRequest.getInputStream()));

Creating Spring Framework resources at runtime

In some cases it is useful to combine XmlMerge and the Spring Framework. For example, the Spring Resource bean shown in Listing 4 was created at runtime by merging separate XML files into a single XML stream. You could then use the Resource bean to configure other resources for object-relational mapping, document generation and more.

Listing 4. A Spring Resource bean

<bean name="mergedResource"
      class="ch.elca.el4j.services.xmlmerge.springframework.XmlMergeResource">
  <property name="resources">
    <list>
      <bean class="org.springframework.core.io.ClassPathResource">
        <constructor-arg>
          <value>ch/elca/el4j/tests/xmlmerge/r1.xml</value>
        </constructor-arg>
      </bean>
      <bean class="org.springframework.core.io.ClassPathResource">
        <constructor-arg>
          <value>ch/elca/el4j/tests/xmlmerge/r2.xml</value>
        </constructor-arg>
      </bean>
    </list>            
  </property>
  <property name="properties">
    <map>
      <entry key="action.default" value="COMPLETE"/>
      <entry key="XPath.path1" value="/root/a"/>
      <entry key="action.path1" value="MERGE"/>
    </map>
  </property>
</bean>

Generating an automated deployment descriptor at build time

You’ve probably used Ant to automate your builds. How about combining it with XmlMerge to generate an XML deployment descriptor at build time? Listing 5 shows the XmlMergeTask at work.

Listing 5. XmlMergeTask generates a deployment descriptor

<target name="test-task">
  <taskdef name="xmlmerge"
           classname="ch.elca.el4j.services.xmlmerge.anttask.XmlMergeTask"
           classpath="xmlmerge-full.jar"/>

  <xmlmerge dest="out.xml" conf="test.properties">
     <fileset dir="test">
        <include name="source*.xml"/>
     </fileset>
  </xmlmerge>
</target>

Using XPath declarations with XmlMerge

You’ve seen a few examples of applying XmlMerge to common Java enterprise development scenarios. l’ll spend the remainder of this article explaining how the tool works. By default, you can use XPath declarations to specify how XmlMerge handles your XML sources. A sample configuration is shown in Listing 6.

Listing 6. xmlmerge.properties

action.default=COMPLETE   # By default, only add elements not
                          # already existing in first file

XPath.a=/root/a           # define a XPath named "a" and matching
                          # all <a> elements under <root>

action.a=MERGE            # configure to merge children of <a>

Listing 7 shows two more files that need to be merged, this time as specified by the above XPath declarations.

Listing 7. Two XML files waiting to be merged

OriginalPatch
<root>
<a/>
<c/>
</root>
<root>
<a>
 <b/>
</a>
<c>
 <d/>
</c>
</root>

Listing 8 shows the files merged as specified by the XPath declarations.

Listing 8. Two files merged as specified

<root>
<a>
 <b/>   # merged the content of the element <a>
</a>
<c/>    # by default, do not modify existing elements
</root>

Using XPath declarations within the XmlMerge utility lets you specify how each element in your XML files will be handled during a merge. In the next section I’ll explain the actions you may have noticed in Listing 5, as well as the use of matching functions in XmlMerge.

Actions and matching functions

XmlMerge provides many built-in actions, some of them extending its functions well beyond simple merging. Consider the following actions and the various ways you could use XPath declarations to apply them to elements in your XML documents.

Table 1. Built-in actions for XmlMerge

ActionDescriptionResult
MERGETraverses in parallel the original and patch elements, determines matching pairs between documents in the order of traversal, and merges children recursively. MERGE is the default action and is sufficient for most common uses where the original and patch documents present elements in the same order.
<root>
<a>
 <b/>
 <c/>
</a>
<d/>
<e/>
</root>
REPLACEReplaces original elements with patch elements. Can also be used to add new (patch) elements to a file.
<root> 
<a> 
 <c/> 
</a> 
<d/> 
<e/> 
</root>
OVERRIDEReplaces an original element with a patch element.
<root> 
<a> 
 <c/> 
</a> 
<d/> 
</root>
COMPLETESelectively adds in patch elements that did not exist in the original, using patch elements to complete the original ones.
<root> 
<a> 
 <b/> 
</a> 
<d/> 
<e/> 
</root>
DELETECopies the original element only if it does not exist in the patch. If it exists in the patch, then nothing is added to the result (the presence of patch elements actually deletes the matching elements from the original).
<root> 
<d/> 
</root>
PRESERVEInvariantly copies the original element regardless of the existence of the patch element (it drops the patch element).
<root> 
<a> 
 <b/> 
</a> 
<d/> 
</root>

It is also possible to tell XmlMerge that elements from the original and the patch correspond to criteria other than the element name. For this you would use matching functions, or matchers.

Table 2. Built-in matchers for XmlMerge

MatcherDescription
TAGThis default matcher says the original and patch elements match if the tag name is the same.
IDThe original and patch elements match if the tag names and the id attribute values are the same.

Customizing XmlMerge

You easily can add your own actions and matchers to the many ones built into XmlMerge. You also can create mapping functions to transform elements before they are written to a merged file (you might want to modify element attributes, for example). For this, you just have to implement an interface, such as the Action interface shown in Listing 9.

Listing 9. The Action interface

public interface Action {

    /**
     * Out of an original element and a second element provided by the patch
     * DOM, applies an operation and modifies the parent node of the result DOM.
     *
     * @param originalElement
     * @param patchElement
     * @param outputParentElement
     */
    void perform(Element originalElement, Element patchElement, Element outputParentElement);

}

You do not need to recompile the XmlMerge library to extend it. Simply add your actions and matcher implementation to the classpath. Note that you should be familiar with DOM4J if you want to extend XmlMerge, because it is the foundation of the XmlMerge library.

Alternatives to XPath configuration

XPath configuration isn’t your only option for customizing XmlMerge. You also can place inline attributes in the patch file to specify how elements will be treated (shown below), or use the Configurer interface to specify your own configuration model.

Listing 10 shows the results of an inline configuration, where attributes are placed in the patch document rather than specified in an external properties file.

Listing 10. Inline configuration

OriginalPatchResult
<root>
<a>
 <b/>
</a>
<d/>
 <e id='1'/>
 <e id='2'/>
</root>
<root xmlns:merge='http://xmlmerge.el4j.elca.ch'>
 <a merge:action='REPLACE'>hello</a>
<c/>
<d merge:action='DELETE'/>
<e id='2' newAttr='3' merge:matcher='ID'/> </root>


<root>
<a>hello</a>
<c/>
 <e id="1" />
 <e id="2" newAttr="3" />
</root>

Using and extending XmlMerge

Since developing XmlMerge, I have used it in many kinds of projects and combined it with various other tools and frameworks. As demonstrated in this article, I’ve leveraged Spring Framework resources to merge iBatis configuration files on the fly. I also have combined XmlMerge with Ant tasks to merge web.xml deployment descriptors at build time. And I’ve used it to prepare variants of a base XML document for the purpose of unit-testing a semantic validation tool.

XmlMerge is meant to be useful out of the box for many of the common tasks required to merge data from XML documents. You also can extend it for other purposes, but for that you may need to extend the merging algorithm. Although the algorithm below is for merging two XML documents, XmlMerge can be used to merge any number of documents.

A simple algorithm for merging XML files

The hardest thing about merging XML files is specifying the expected results. For example, what would you expect as a default result of merging the two files shown in Listing 11?

Listing 11. Two files waiting to be merged

OriginalPatch
<a x="1"/> 
<b x="2"/> 
<a x="3"/>
<b y="4"/> 
<a y="5"/> 
<b y="6"/>

Which of the merged elements would you expect the resulting XML file to begin and end with? Would you want to keep the elements in the same order or not? XmlMerge is based on a straightforward algorithm that traverses each element list only once and returns all elements in the order in which they first appeared. The following rules guarantee a predictable result every time you merge two XML files using XmlMerge:

  • Only one element in the result corresponds to each original element.
  • Only one element in the result corresponds to each patch element.
  • All elements corresponding to original elements appear in the same order in the result as they did in the original.
  • All elements corresponding to the patch elements appear in the same order in the result as they did in the patch.

Cursors traverse both the original document and the patch to find matching pairs. All elements encountered before finding a matching pair are added to the result as is. Here the patch file is incremented first, so the result of merging the two files is as shown in Listing 12.

Listing 12. Patch first

<b y="4" />
<a x="1" y="5" />
<b x="2" y="6" />
<a x="3" />

If you switched the order of the original and patch shown in Listing 11, you would obtain the following result instead:

Listing 13. Result of switching the order of the merge

<a x="1" />
<b y="4" x="2" />
<a y="5" x="3" />
<b y="6" />

Although it is broadly applicable, the default merging algorithm used by XmlMerge will not work for all use cases. For instance, you may want to extend the existing algorithm to handle advanced XML tree merging.

In conclusion

XmlMerge is not a cure-all for XML merging needs. It is a relatively simple tool that leverages DOM4J and XPath declarations to ease the process of merging data from different XML files. It is easily combined with other development tools and frameworks (such as the Spring Framework and Ant) and can be used out of the box or customized for use in specialized projects. Because it’s based on DOM4J rather than SAX, XmlMerge is not optimized for performance or memory, both factors that may rule out its use in some development projects. XmlMerge is also a work in progress. Its built-in behavior for handling attributes currently is not as rich as the behavior for handling elements, which are simply merged or replaced. XmlMerge is intended to provide a structurally sound, fully extensible framework for merging and manipulating data from a wide variety of sources. See the Resources section to download XmlMerge and learn more about it.

Laurent Bovet is a software architect at ELCA, the leading Swiss IT company responsible for Java enterprise development frameworks, such as LEAF and EL4J. He has worked on numerous Java-based distributed systems and is the creator of the EL4J XmlMerge library.