by Scott Hiles

XML DOM-lite parser and writer

news
Sep 6, 20048 mins

Write and parse XML

Applications frequently need to store persistent data, and the format for saving can be any format conceivable by the developer. However, each time a new application is written, the set of tools for saving and retrieving the information must be rewritten. Binary object files are simple to write in Java, but are not fully portable to other languages, nor are they human readable. Character-delimited files are highly portable but difficult to read and require encoding to embed the delimiter. As applications evolve, the complexity of the data to be managed can often grow more complex, and future generations of the application must migrate existing users to the new formats.

A simple and flexible data format is needed, with a set of methods for manipulation; the format should have the following characteristics:

  • Human readable for simplicity
  • Self descriptive so that data context is in the file
  • Extensible so that applications can grow without rewriting the file format

Fortunately, the format and the tools already exist in Java as well as most other computer languages in the form of XML. XML data is ideal for storing, saving, and retrieving persistent data; however, reading and writing XML is not so simple. J2SE provides a SAX (Simple API for XML) parser, and JDOM is a freely available DOM (Document Object Model) parser, but both introduce complexities. Incorporating SAX into an application is challenging, and no methods are available for creating the XML file, leaving the programmer to find a means for generating XML. JDOM provides both the writer and the parser, but requires either a 3-megabyte download, thus burdening the user with download and installation, or an application distribution that includes the full JDOM package.

For large applications, these complexities are little more than a challenge to the developer. But, for small applications that do not need the full XML standard, a small and simple XML class is needed. A quick Web search for a simple XML parser resulted in JavaWorld’s “Java Tip 128: Create a Quick-and-Dirty XML Parser” by Steven Brandt, which provides a compact and small method for reading XML in SAX style. But incorporating this code results in two drawbacks: First, the approach requires the complexity of writing callback methods inherent to a SAX parser. Second, it does not provide a means for generating XML code to write valid XML.

With about a day of work, Brandt’s SAX XML parser can be converted into a simple DOM-type parser that has the ability to manage data as well as read and write the data in XML format. The parser, which I introduce in this article, operates identically to Brandt’s parser with the same limited scope of format.

The implementation

The goals of the XML reader/writer class are as follows:

  • It must be small and contained in a single class (including the exception classes)
  • It must maintain data in memory for simple manipulation with a set of methods similar to those of a vector
  • It must store each element as a separate object capable of recursion
  • Each object must independently implement attributes
  • It must provide the basic tags needed to store a simple configuration

XML element representation

Before beginning to convert Brandt’s parser, the object representing an XML element must be created. Each XML object contains the tag, attributes, and the data of the element. The element’s data may be a single value or a list of elements. The Java code for the element is shown below:

public class myXML {
  String tag;                           // The element name
  Object element = null;                // Either an Object or a Vector()
  Attribute Attribute = new Attribute();// Attributes for the element
...
  public class Attribute {
    private Vector attributes = new Vector();
...
    private class attribute {
      public String name = null;        // Attribute name
      public String value = null;       // Attribute data
    }
  }
}

By storing the XML data in a tree of objects, the application can use methods similar to those used to manipulate vectors for managing the data. Basic methods such as add(), find(), contains(), remove(), isEmpty(), and get() are implemented to make the class complete. These are straightforward and can be found in the source code downloadable from Resources.

Writing the XML elements

Once the tree is stored in memory, it is almost trivial to write the data out to disk using recursion. The most complex part of that task is formatting the output to be eye-catching; recursion also solves this problem. Starting at the top of the tree, dump the tag and attributes, then, if the element object is non-null, either write the element out if it is not a vector type object, or recursively repeat the writing task with the objects in the vector. Here’s the approach:

void serialize(String indent,PrintWriter out) {
  out.print(indent+"<"+tag+" "+dumpattributes()");
  if (element == null) {
    out.println("/>");
    return;
  }
  if (element.getClass() == Vector.class) {
    out.println(">"};
    for (int I = 0; I < element.size(); i++) {
      element.get(i).serialize(new String(indent+"  "),out);
    }
    out.println(indent+"</"+tag+">");
    return;
  }
  out.print(">"+(String)element+"</"+tag+">");
  return;
}

As the code illustrates, only three cases must be handled. In the first case, a null element is allowed in XML and terminated by a "/>". The second case is where the element has subelements, in which case, each subelement is recursively processed, adding two spaces for the indent. The final case is where the element just contains data, which ends any recursion.

Reading the XML elements

Reading the data is not quite so simple. Brandt’s reader implementation provided the basis for the parser, but must be adapted to store the data in the object tree and manipulated to recursively load the data. To create a recursive loader that can deal with end tags properly, a method has been created to read and unread data using a stack. This complexity is hidden in a private class within the XML class and contains a read() method that operates exactly like any buffered reader with the exception that it first checks a stack for data before reading from the file. The class also includes an unread() method to push data back on the stack:

  private class myFileReader {
    Stack stack = new Stack();
    BufferedReader f = null;
    myFileReader(BufferedReader in) {
      f = in;
    }
    void unread(int c) {
      stack.push(new Integer(c));
    }
    int read() throws IOException {
      if (stack.empty())
        try {
          return f.read();
        } catch (IOException e) {
          throw e;
        }
      else {
        int c = ((Integer)stack.pop()).intValue();
        return c;
      }
    }
  }

With this class and one more modification, the parser is complete. To make Brandt’s parser recursive, the method must call itself whenever it finds an element’s beginning. As soon as it finds < followed by any character other than / or ?, it pushes the character back into the file reader and recurses.

That’s all it takes. The entire class is about 600 lines of actual code. About 300 lines of comments were added to make it Javadoc happy. Having the code is all very good, but the final piece of the project is ensuring it is simple to use and easy to integrate with an application.

An example

How to create an XML tree in memory to write to a disk:

  1. Create the root node with a name (myXML root = new myXML("roottag");)
  2. Create subelements
  3. If subelement will have subelements (branch), use the form without a value (myXML subelement = roottag.addElement("subelement");)
  4. If the subelement will not have subelements (leaf), add the element with a value (myXML subelement = roottag.addElement("element",value);)
  5. For each subelement (branch and leaf), add attributes (subelement.Attribute.add("name",value);) or (root.Attribute.add("name",value);)
  6. Continue the previous steps until you have completed the tree
  7. Write it to a PrintWriter stream to generate XML

Putting that into code:

myXML root = new myXML("root");
myXML sub1 = root.addElement("element1",value1);
sub1.Attribute.add("attribute1",att1);
sub1.Attribute.add("attribute2",att2);
myXML sub2 = root.addElement("element2");
sub2.Attribute.add("attribute20",att20);
myXML sub21 = sub1.addElement("element21");
sub21.Attribute.add("attribute21",value21);
root.serialize(pw);

Results in the following:

<?xml version="1.0" standalone="yes"?>
<root>
  <element1 attribute1="att1" attribute2="att2">value1</element1>
  <element2 attribute20="att20">
    <element21 attribute21="att21">value21</element21>
  </element2>
</root>

How to load XML tree from disk:

  1. Open a BufferedReader
  2. Create a new myXML(BufferedReader) object, passing it the BufferedReader

Putting that into code:

in = new BufferedReader(file);
xmlroot = new myXML((BufferedReader)in);

How to walk through the XML tree:

  • myXML.getTag(): Returns the tag name of the myXML object
  • myXML.getValue(): Returns the value of the myXML object
  • myXML.getElement(int): Returns the myXML subobject at the specified index
  • myXML.findElement(String tag): Returns the first myXML object with the name tag
  • myXML.findElement(String tag, Attribute name): Returns the first myXML object with the name tag and attribute name
  • myXML.Attribute.find(String name): Returns the attribute value associated with name

Many other methods exist to manage the XML data tree. The source code has been commented so that documentation can be created by running the source through Javadoc.

A live example

There’s nothing like real code to drive home the usefulness of a class. Project WiSH is an applet and application that uses the myXML parser/writer to manage the configuration of an x10 home automation system.

With this simple XML class, developers can quickly incorporate the use of XML to store and retrieve data that can change and grow with the application’s evolution.

Scott Hiles has a master’s of science in electrical engineering and is the author of the Sourceforge Project WiSH for X10 home automation.