Java XML and JSON: Document processing for Java SE, Part 1: SAXON and Jackson

Transforming and converting XML and JSON documents with SAXON and Jackson

transformation / conversion / data cubes shift from one color to another

XML and JSON are important to me, and I’m grateful to Apress for letting me write an entire book about them. In this article I will briefly introduce the second edition of my new book, Java XML and JSON. I’ll also present two useful demos that I would have liked to include in the book if I’d had space for them.

First, I’ll show you how to override Xalan, which is the standard XSLT implementation for Java 11, with an XSLT 2.0+ and XPath 2.0+-compatible alternative, in this case SAXON. Using SAXON for XSLT/XPath makes it much easier to access features such as grouping, which I’ll also demonstrate. Next, I’ll show you two ways to convert XML to JSON with Jackson: the first technique is data binding, the second is tree traversal.

Why XML and JSON?

Before XML arrived, I wrote software to import data stored in an undocumented binary format. I used a debugger to identify data field types, file offsets, and lengths. When XML came along, and then JSON, the technology greatly simplified my life.

The first edition of Java XML and JSON (June 2016) introduces XML and JSON, explores Java SE’s own XML-oriented APIs, and explores external JSON-oriented APIs for Java SE. The second edition, recently published by Apress, offers new content, and (hopefully) answers more questions about XML, JSON, Java SE’s XML APIs, and various JSON APIs, including JSON-P. It’s also updated for Java SE 11.

After writing the book I wrote two additional sections introducing useful features of SAXON and Jackson, respectively. I’ll present those sections in this article. First, I’ll take a minute to introduce the book and its contents.

Java XML and JSON, second edition

Ideally, you should read the second edition of Java XML and JSON before studying the additional content in this article. Even if you haven’t read the book yet, you should know what it covers, because that information puts the additional sections in context.

The second edition of Java XML and JSON is organized into three parts, consisting of 12 chapters and an appendix:

Part 1: Exploring XML
- Chapter 1: Introducing XML
- Chapter 2: Parsing XML Documents with SAX
- Chapter 3: Parsing and Creating XML Documents with DOM
- Chapter 4: Parsing and Creating XML Documents with StAX
- Chapter 5: Selecting Nodes with XPath
- Chapter 6: Transforming XML Documents with XSLT
Part 2: Exploring JSON
- Chapter 7: Introducing JSON
- Chapter 8: Parsing and Creating JSON Objects with mJson
- Chapter 9: Parsing and Creating JSON Objects with Gson
- Chapter 10: Extracting JSON Values with JsonPath
- Chapter 11: Processing JSON with Jackson
- Chapter 12: Processing JSON with JSON-P
Part 3: Appendices
- Appendix A: Answers to Exercises

Part 1 focuses on XML. Chapter 1 defines key terminology, presents XML language features (XML declaration, elements and attributes, character references and CDATA sections, namespaces, and comments and processing instructions), and covers XML document validation (via Document Type Definitions and schemas). The remaining five chapters explore Java SE’s SAX, DOM, StAX, XPath, and XSLT APIs.

Part 2 focuses on JSON. Chapter 7 defines key terminology, tours JSON syntax, demonstrates JSON in a JavaScript context (because Java SE has yet to officially support JSON), and shows how to validate JSON objects (via the JSON Schema Validator online tool). The remaining five chapters explore the third-party mJSon, Gson, JsonPath, and Jackson APIs; and Oracle’s Java EE-oriented JSON-P API, which is also unofficially available for use in a Java SE context.

Each chapter ends with a set of exercises, including programming exercises, which are designed to reinforce the reader’s understanding of the material. Answers are revealed in the book’s appendix.

The new edition differs from its predecessor in some significant ways:

Chapter 2 shows the proper way to obtain an XML reader. The previous edition’s approach is deprecated.
Chapter 3 also introduces the DOM’s Load and Save, Range, and Traversal APIs.
Chapter 6 shows how to work with SAXON to move beyond XSLT/XPath 1.0.
Chapter 11 is a new (lengthy) chapter that explores Jackson.
Chapter 12 is a new (lengthy) chapter that explores JSON-P.

This edition also corrects minor errors in the previous edition’s content, updates various figures, and adds numerous new exercises.

While I didn’t have room for it in the second edition, a future edition of Java XML and JSON may cover YAML.

Addendum to Chapter 6: Transforming XML documents with XSLT

Move beyond XSLT/XPath 1.0 with SAXON

Java 11’s XSLT implementation is based on the Apache Xalan Project, which supports XSLT 1.0 and XPath 1.0 but is limited to these early versions. To access the later XSLT 2.0+ and XPath 2.0+ features, you need to override the Xalan implementation with an alternative such as SAXON.

Java XML and JSON, Chapter 6 shows how to override Xalan with SAXON, then verify that SAXON is being used. In the demo, I recommend inserting the following line at the beginning of an application’s main() method, in order to use SAXON:

System.setProperty("javax.xml.transform.TransformerFactory",
                   "net.sf.saxon.TransformerFactoryImpl");

You don’t actually need this method call because SAXON’s TransformerFactory implementation is provided in a JAR file as a service that’s loaded automatically when the JAR file is accessible via the classpath. However, if there were multiple TransformerFactory implementation JAR files on the classpath, and if the Java runtime chose a non-SAXON service as the transformer implementation, there could be a problem. Including the aforementioned method call would override that choice with SAXON.

XSLT/XPath features: A demo

Chapter 6 presents two XSLTDemo applications, and a third application is available in the book’s code archive. Listing 1, below, presents a fourth XSLTDemo demo application that highlights XSLT/XPath features.

Listing 1. XSLTDemo.java

import java.io.FileReader;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;

import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

import org.xml.sax.SAXException;

import static java.lang.System.*;

public class XSLTDemo
{
   public static void main(String[] args)
   {
      if (args.length != 2)
      {
         err.println("usage: java XSLTDemo xmlfile xslfile");
         return;
      }

      try
      {
         DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
         DocumentBuilder db = dbf.newDocumentBuilder();
         Document doc = db.parse(args[0]);
         TransformerFactory tf = TransformerFactory.newInstance();
         out.printf("TransformerFactory: %s%n", tf);
         FileReader fr = new FileReader(args[1]);
         StreamSource ssStyleSheet = new StreamSource(fr);
         Transformer t = tf.newTransformer(ssStyleSheet);
         Source source = new DOMSource(doc);
         Result result = new StreamResult(out);
         t.transform(source, result);
      }
      catch (IOException ioe)
      {
         err.printf("IOE: %s%n", ioe.toString());
      }
      catch (FactoryConfigurationError fce)
      {
         err.printf("FCE: %s%n", fce.toString());
      }
      catch (ParserConfigurationException pce)
      {
         err.printf("PCE: %s%n", pce.toString());
      }
      catch (SAXException saxe)
      {
         err.printf("SAXE: %s%n", saxe.toString());
      }
      catch (TransformerConfigurationException tce)
      {
         err.printf("TCE: %s%n", tce.toString());
      }
      catch (TransformerException te)
      {
         err.printf("TE: %s%n", te.toString());
      }
      catch (TransformerFactoryConfigurationError tfce)
      {
         err.printf("TFCE: %s%n", tfce.toString());
      }
   }
}

The code in Listing 1 is similar to Chapter 6’s Listing 6-2, but there are some differences. First, Listing 1’s main() method must be called with two command-line arguments: the first argument names the XML file; the second argument names the XSL file.

The second difference is that I don’t set any output properties on the transformer. Specifically, I don’t specify the output method or whether indentation is used. These tasks can be accomplished in the XSL file.

Compile Listing 1 as follows:

javac XSLTDemo.java

XSLT 2.0 example: Grouping nodes

XSLT 1.0 doesn’t offer built-in support for grouping nodes. For example, you might want to transform the following XML document, which lists books with their authors:

<book title="Book 1">
  <author name="Author 1" />
  <author name="Author 2" />
</book>
<book title="Book 2">
  <author name="Author 1" />
</book>
<book title="Book 3">
  <author name="Author 2" />
  <author name="Author 3" />
</book>

into the following XML, which lists authors with their books:

<author name="Author 1">
  <book title="Book 1" />
  <book title="Book 2" />
</author>
<author name="Author 2">
  <book title="Book 1" />
  <book title="Book 3" />
</author>
<author name="Author 3">
  <book title="Book 3" />
</author>

While this transformation is possible in XSLT 1.0, it’s awkward. XSLT 2.0’s xsl:for-each-group element, by contrast, lets you take a set of nodes, group it by some criterion, and process each created group.

Let’s explore this capability, starting with an XML document to process. Listing 2 presents the contents of a books.xml file that groups author names by book title.

Listing 2. books.xml (grouping by book title)

<?xml version="1.0"?>
<books>
   <book title="Securing Office 365: Masterminding MDM and Compliance in the Cloud">
     <author name="Matthew Katzer"/>
     <publisher name="Apress" isbn="978-1484242292" pubyear="2019"/>
   </book>
   <book title="Office 2019 For Dummies">
     <author name="Wallace Wang"/>
     <publisher name="For Dummies" isbn="978-1119513988" pubyear="2018"/>
   </book>
   <book title="Office 365: Migrating and Managing Your Business in the Cloud">
     <author name="Matthew Katzer"/>
     <author name="Don Crawford"/>
     <publisher name="Apress" isbn="978-1430265269" pubyear="2014"/>
   </book>
</books>

Listing 3 presents the contents of a books.xsl file that provides the XSL transformation to turn this document into one that groups book titles according to author names.

Listing 3. books.xsl (grouping by author name)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">
  <xsl:output method="html" indent="yes"/>
  <xsl:template match="/books">
<html>
<head>
</head>
<body>
      <xsl:for-each-group select="book/author" group-by="@name">
      <xsl:sort select="@name"/>
<author name="{@name}">
          <xsl:for-each select="current-group()">
          <xsl:sort select="../@title"/>
<book title="{../@title}" />
          </xsl:for-each>
</author>
      </xsl:for-each-group>
</body>
</html>
  </xsl:template>
</xsl:stylesheet>

The xsl:output element indicates that indented HTML output is required. The xsl:template-match element matches the single books root element.

The xsl:for-each-group element selects a sequence of nodes and organizes them into groups. The select attribute is an XPath expression that identifies the elements to group. Here, it’s told to select all author elements that belong to book elements. The group-by attribute groups together all elements having the same value for a grouping key, which happens to be the @name attribute of the author element. In essence, you end up with the following groups:

Group 1

Matthew Katzer
Matthew Katzer

Group 2

Wallace Wang

Group 3

Don Crawford

These groups are not in alphabetical order of author names, and so author elements will be output such that Matthew Katzer is first and Don Crawford is last. The xsl:sort select="@name" element ensures that author elements are output in sorted order.

The <author name="{@name}"> construct outputs an <author> tag whose name attribute is assigned only the first author name in the group.

Continuing, xsl:for-each select="current-group()" iterates over the author names in the current for-each-group iteration’s group. The xsl:sort select="../@title" construct will sort the output book elements, specified via the subsequent <book title="{../@title}" /> construct, according to the book titles.

Transformation

Now let’s attempt the transformation. Execute the following command:

java XSLTDemo books.xml books.xsl

Unfortunately, this transformation fails: you should observe output that identifies Apache Xalan as the transformer factory and an error message stating that xsl:for-each-group is not supported.

Let’s try again. Assuming that saxon9he.jar and XSLTDemo.class are located in the current directory, execute the following command:

java -cp saxon9he.jar;. XSLTDemo books.xml books.xsl

This time, you should observe the following sorted and properly grouped output:

<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
   </head>
   <body>
      <author name="Don Crawford">
         <book title="Office 365: Migrating and Managing Your Business in the Cloud"></book>
      </author>
      <author name="Matthew Katzer">
         <book title="Office 365: Migrating and Managing Your Business in the Cloud"></book>
         <book title="Securing Office 365: Masterminding MDM and Compliance in the Cloud"></book>
      </author>
      <author name="Wallace Wang">
         <book title="Office 2019 For Dummies"></book>
      </author>
   </body>
</html>

Addendum to Chapter 11: Processing JSON with Jackson

Converting XML to JSON with Jackson

Java XML and JSON, Chapter 11, introduces Jackson, which provides APIs for parsing and creating JSON objects. It’s also possible to use Jackson to convert XML documents to JSON documents.

In this section, I’ll show you two ways to convert XML to JSON, first with data binding and then with tree traversal. I’ll assume that you’ve read Chapter 11 and are familiar with Jackson. In order to follow these demos, you should have downloaded the following JAR files from the Maven repository:

jackson-annotations-2.9.7.jar
jackson-core-2.9.7.jar
jackson-databind-2.9.7.jar

You’ll need a few additional JAR files, as well; most are common to both conversion techniques. I’ll provide information on obtaining these JAR files shortly.

Convert XML to JSON with data binding

Data binding lets you map serialized data to a Java object. For example, suppose you have a small XML document that describes a single planet. Listing 4 presents this document.

Listing 4. planet.xml

<?xml version="1.0" encoding="UTF-8"?>
<planet>
    <name>Earth</name>
    <planet_from_sun>3</planet_from_sun>
    <moons>9</moons>
</planet>

Listing 5 presents an equivalent Java Planet class whose objects map to planet.xml‘s content.

Listing 5. Planet.java

public class Planet
{
   public String name;
   public Integer planet_from_sun;
   public Integer moons;
}

The conversion process requires that you first parse the XML into a Planet object. You can accomplish this task by working with the com.fasterxml.jackson.dataformat.xml.XmlMapper class, as follows:

XmlMapper xmlMapper = new XmlMapper();
XMLInputFactory xmlif = XMLInputFactory.newFactory();
FileReader fr = new FileReader("planet.xml");
XMLStreamReader xmlsr = xmlif.createXMLStreamReader(fr);
Planet planet = xmlMapper.readValue(xmlsr, Planet.class);

XmlMapper is a customized com.fasterxml.jackson.databind.ObjectMapper that reads and writes XML. It provides several readValue() methods for reading a single XML value from an XML-specific input source; for example:

<T> T readValue(XMLStreamReader r, Class<T> valueType)

Each readValue() method requires a javax.xml.stream.XMLStreamReader object as its first argument. This object is essentially a StAX-based stream-based parser for efficiently parsing text in a forward manner.

The second argument is a java.lang.Class object for the target type that is being instantiated, populated with XML data, and whose instance is subsequently returned from the method.

The bottom line of this code fragment is that Listing 4’s content is read into a Planet object that readValue() returns to its caller.

Once the object has been created, it’s easy to write it out as JSON by working with ObjectMapper and its String writeValueAsString(Object value) method:

ObjectMapper jsonMapper = new ObjectMapper();
String json = jsonMapper.writeValueAsString(planet);

I excerpted these code fragments from an XML2JSON application whose complete source code appears in Listing 6.

Listing 6. XML2JSON.java (Version 1)

import java.io.FileReader;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

import com.fasterxml.jackson.databind.ObjectMapper;

import com.fasterxml.jackson.dataformat.xml.XmlMapper;

import static java.lang.System.*;

public class XML2JSON
{
   public static void main(String[] args) throws Exception
   {
      XmlMapper xmlMapper = new XmlMapper();
      XMLInputFactory xmlif = XMLInputFactory.newFactory();
      FileReader fr = new FileReader("planet.xml");
      XMLStreamReader xmlsr = xmlif.createXMLStreamReader(fr);
      Planet planet = xmlMapper.readValue(xmlsr, Planet.class);
      ObjectMapper jsonMapper = new ObjectMapper();
      String json = jsonMapper.writeValueAsString(planet);
      out.println(json);
   }
}

Before you can compile Listings 5 and 6, you’ll need to download Jackson Dataformat XML, which implements XMLMapper. I downloaded version 2.9.7, which matches the versions of the other three Jackson packages.

Assuming that you’ve successfully downloaded jackson-dataformat-xml-2.9.7.jar, execute the following command (spread over two lines for readability) to compile the source code:

javac -cp jackson-core-2.9.7.jar;jackson-databind-2.9.7.jar;jackson-dataformat-xml-2.9.7.jar;.
      XML2JSON.java

Before you can run the resulting application, you’ll need to download Jackson Module: JAXB Annotations, and also download StAX 2 API. I downloaded JAXB Annotations version 2.9.7 and StAX 2 API version 3.1.3.

Assuming that you’ve successfully downloaded jackson-module-jaxb-annotations-2.9.7.jar and stax2-api-3.1.3.jar, execute the following command (spread across three lines for readability) to run the application:

java -cp jackson-annotations-2.9.7.jar;jackson-core-2.9.7.jar;jackson-databind-2.9.7.jar;
     jackson-dataformat-xml-2.9.7.jar;jackson-module-jaxb-annotations-2.9.7.jar; stax2-api-3.1.3.jar;.
     XML2JSON

If all goes well, you should observe the following output:

{"name":"Earth","planet_from_sun":3,"moons":9}

Convert XML to JSON with tree traversal

Another way to convert from XML to JSON is to first parse the XML into a tree of JSON nodes and then write this tree to a JSON document. You can accomplish the first task by calling one of XMLMapper‘s inherited readTree() methods:

XmlMapper xmlMapper = new XmlMapper();
JsonNode node = xmlMapper.readTree(xml.getBytes());

ObjectMapper‘s JsonNode readTree(byte[] content) method deserializes JSON content into a tree of jackson.databind.JsonNode objects, and returns the root JsonNode object of this tree. In an XmlMapper context, this method deserializes XML content into the tree. In either case, the JSON or XML content is passed to this method as an array of bytes.

The second task — converting the tree of objects to JSON — is accomplished in a similar manner to what I previously showed. This time, it’s the JsonNode root object that’s passed to writeValueAsString():

ObjectMapper jsonMapper = new ObjectMapper();
String json = jsonMapper.writeValueAsString(node);

I excerpted these code fragments from an XML2JSON application whose complete source code appears in Listing 7.

Listing 7. XML2JSON.java (version 2)

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import com.fasterxml.jackson.dataformat.xml.XmlMapper;

import static java.lang.System.*;

public class XML2JSON
{
   public static void main(String[] args) throws Exception
   {
      String xml = "<?xml version="1.0" encoding="UTF-8"?>n"+
                   "<planet>n" +
                   "    <name>Earth</name>n" +
                   "    <planet_from_sun>3</planet_from_sun>n" +
                   "    <moons>1</moons>n" +
                   "</planet>n";

      XmlMapper xmlMapper = new XmlMapper();
      JsonNode node = xmlMapper.readTree(xml.getBytes());
      ObjectMapper jsonMapper = new ObjectMapper();
      String json = jsonMapper.writeValueAsString(node);
      out.println(json);
   }
}

Execute the following command (spread over two lines for readability) to compile Listing 7:

javac -cp jackson-core-2.9.7.jar;jackson-databind-2.9.7.jar;jackson-dataformat-xml-2.9.7.jar
      XML2JSON.java

Before you can run the resulting application, you’ll need to download Woodstox, which is a high-performance XML processor that implements StAX, SAX2, and StAX2. I downloaded Woodstox 5.2.0. Then execute the following command (spread across three lines for readability) to run the application:

java -cp jackson-annotations-2.9.7.jar;jackson-core-2.9.7.jar;jackson-databind-2.9.7.jar;
     jackson-dataformat-xml-2.9.7.jar;stax2-api-3.1.3.jar;woodstox-core-5.2.0.jar;.
     XML2JSON

If all goes well, you should observe the following output:

{"name":"Earth","planet_from_sun":"3","moons":"1"}

Notice that the numbers assigned to the planet_from_sun and moons XML elements are serialized to JSON strings instead of numbers. The readTree() method doesn’t infer the data type in the absence of an explicit type definition.

Jackson’s support for XML tree traversal has additional limitations:

Jackson is unable to differentiate between objects and arrays. Because XML provides no means to differentiate an object from a list (array) of objects, Jackson collates repeated elements into a single value.
Jackson doesn’t support mixed content (textual content and elements as children of an element). Instead, it maps each XML element to a JsonNode object. Any text is lost.

Given these limitations, it’s not surprising that the official Jackson documentation recommends against parsing XML into JsonNode-based trees. You’re better off using the data binding conversion technique.

Conclusion

The material presented in this article should be considered as addendum to Chapters 6 and 11 in the second edition of Java XML and JSON. In contrast, my next article will be related to the book but entirely new material. Keep your eye out for my upcoming article about binding Java objects to JSON documents with JSON-B.

JavaSoftware DevelopmentWeb Development

Java XML and JSON: Document processing for Java SE, Part 1: SAXON and Jackson

Transforming and converting XML and JSON documents with SAXON and Jackson

Why XML and JSON?

Java XML and JSON, second edition

Addendum to Chapter 6: Transforming XML documents with XSLT

Move beyond XSLT/XPath 1.0 with SAXON

XSLT/XPath features: A demo

Listing 1. XSLTDemo.java

XSLT 2.0 example: Grouping nodes

Listing 2. books.xml (grouping by book title)

Listing 3. books.xsl (grouping by author name)

Transformation

Addendum to Chapter 11: Processing JSON with Jackson

Converting XML to JSON with Jackson

Convert XML to JSON with data binding

Listing 4. planet.xml

Listing 5. Planet.java

Listing 6. XML2JSON.java (Version 1)

Convert XML to JSON with tree traversal

Listing 7. XML2JSON.java (version 2)

Conclusion

More from this author

How to use Java generics to avoid ClassCastExceptions

Exception handling in Java: Advanced features and types

Exception handling in Java: The basics

Packages and static imports in Java

Static classes and inner classes in Java

Java polymorphism and its types

Deciding and iterating with Java statements

How to describe Java code with annotations

Show me more

AI optimization: How we cut energy costs in social media recommendation systems

Cloud at 20: Cost, complexity, and control

Google adds vibe design to Stitch UI design tool

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)