Write a reusable ContentHandler for reading objects from XML Application data should be held in a database or in configuration files, but never hardcoded in the application itself. Developers don’t need to think about a configuration file format: XML is the preferred choice. The SAX and DOM API implementations help us read XML files. DOM is a tree-based datastructure built from an XML file; SAX presents the document as a sequence of events (it reports every time it encounters begin and end tags). For handling those events, we must implement a ContentHandler and register it to the org.xml.sax.XMLReader, which reads the XML file. In this article, we will design a component that handles the repetitious task of creating objects from configuration files.Mapping objects to XMLIn this section, we will see how objects must be mapped to XML in order to use a generic ContentHandler, from where the object will be created. First we will look at the definition of field values. Then we will see how the associations between the objects are defined. Finally, I will demonstrate the class model, which provides the whole infrastructure to create and initialize the objects from XML easily.Field valuesIn XML, you can express a value as an attribute or an element. This allows you to express an object’s field values in different ways. Here are four suggestions for mapping object Person, with the fields firstName and lastName, to XML: Suggestion 1:<person id="1" firstName="Nick" lastName="Kassem"/> Suggestion 2:<person id="1"> <fields firstName=" Nick" lastName="Kassem"/> </person> Suggestion 3: <person type=" test.person" id="1"> <firstName> Nick </firstName> <lastName>Kassem</lastName> </person> Suggestion 4:<object type=" test.person" id="1"> <field name="firstName"> Nick </field> <field name="lastName"> Kassem</field> </object> Besides the field values, I added the attribute id and/or the attribute type to each suggestion. (id represents an object identity; type represents the object type.) Each object representation would probably work fine, but I prefer the fourth approach because of a rule I found in Brett McLaughlin’s Java and XML (see Resources for a link):Although there is no specification or widely accepted standard for determining when to use an attribute and when to use an element, there is a good rule of thumb: use elements for presentable data and attributes for system data.Applying that rule helps us separate system data from user data; this improves XML files’ readability. When we deal with user data, we can concentrate on elements and ignore the attribute values. AssociationsObjects contain not only fields, but also associations for linking objects. To express an association, you can follow the approach used in DTDs with attribute type IDREFS: <!ELMENT person (firstName, lastName)> <!ELMENT personList (listName)> <!ATTLIST person id ID #required> <!ATTLIST personList refPersons IDREFS #implied> <person id="1"> <firstName>firstName1</firstName> <lastName>lastName1</lastName> </person> <person id="2"> <firstName>firstName2</firstName> <lastName>lastName2</lastName> </person> <personList refPersons="1 2"> <listName>listName1</listName> </personList> By adding all referenced object ids to one string, we are able to express 1:n and n:m associations. You can express the ids for associations in either attributes or elements. For readability’s sake, I suggest always using the same approach. (I will use elements here.) I will use two more objects — Communication and PersonList — to illustrate how this works:<person id="1" type="test.person"> <firstName>Nick</firstName> <lastName>Kassem</lastName> </person> <person id="2" type="test.person"> <firstName>Phill</firstName> <lastName>Harris</lastName> </person> <communication id="10" type="test.communication"> <medium>telefon</medium> <address>222 222 222</address> <assoc_personID>1</assoc_personID> </communication> <communication id="11" type="test.communication"> <medium>email</medium> <address>nick@assem.com</address> <assoc_personID>1</assoc_personID> </communication> <personList id="21" type="test.personList"> <listName>loves skiing</listName> <assoc_personIDS>1 2 </assoc_personIDS> </personList> <personList id="22" type="test.personList"> <listName>loves swimming</listName> <assoc_personIDS>2</assoc_personIDS> </personList> In the XML example above, we have a Person, “Nick Kassem,” with two objects of the type Communication. We have a PersonList — “loves skiing” — with “Nick Kassem” and “Phill Harris.” And we have a PersonList — “loves swimming” — with “Phill Harris.”Now that we know how to map an object structure to XML, we can start to think about how to map the XML to objects. The above example already gave us some hints for doing that. We currently don’t have any embedded object structures like: <person type="test.person"> <firstName>hhh</firstName> <communictionList> <communication type="test.communication"> <medium>telefon</medium> <number>222 222 222 </number> </communication> </communicationList> <lastName>uuu</lastName> </person> In such an embedded structure of objects, it would be much more difficult to collect the data for Person and Communication in a generic way. Since we don’t allow embedded object structures, the data-collecting process for a specific object becomes very easy. An object type is specified by the attribute type, and the data for that object is enclosed in the element <person>. Anytime we encounter an element with an attribute type in the ContentHandler‘s method startElement(), we start the collecting process. That process lasts until we encounter that element again in the endElement() callback method of ContentHandler.The following code illustrates this process:public void startElement(String namespaceURI, String aLocalName, String aRawName, Attributes anAttrs) throws SAXException { for (int i = 0; i < anAttrs.getLength(); i++) { // if the current element contains an attribute "type", then // we are on an element that includes the values for a new object if (A_TYPE.equals(anAttrs.getLocalName(i))) { if (currentObjectElementName == null) currentObjectElementName = aLocalName; else //cascading object encountered xmlReaderErrorHandler.onObjectCascading(xmlStorage); } if (currentGenericObject.containsAttribute(anAttrs.getLocalName(i))) xmlReaderErrorHandler.onContainsAttribute(anAttrs.getLocalName(i), anAttrs.getValue(i), currentGenericObject); else currentGenericObject.setAttributeValue(anAttrs.getLocalName(i), anAttrs.getValue(i)); } } public void endElement(String sNamespaceURI, String aLocalName, String aRawName) throws SAXException { //adds the value of the element to currentGenericObject if (currentElementValue.length() > 0) { if (currentGenericObject.containsElement(aLocalName)) xmlReaderErrorHandler.onContainsElement(aLocalName, currentElementValue.toString(), currentGenericObject); else currentGenericObject.setElementValue(aLocalName, currentElementValue.toString()); currentElementValue = new StringBuffer(); } // checks if aLocalName is an element tag name which encloses the values for an object if (aLocalName.equals(currentObjectElementName)) { String theTypename = (String) currentGenericObject.getAttributeValue(A_TYPE); //create the object Object theObj = handleObjectCreationFor(theTypename); if (theObj instanceof I_XMLReadable) { ((I_XMLReadable) theObj).fromXML(currentGenericObject, this); xmlReadables.add(theObj); } else { xmlReaderErrorHandler.onNotSupportedType(theObj, currentGenericObject, xmlStorage); } //reset values currentGenericObject = new XMLGenericObject(); currentObjectElementName = null; } } Class modelThis figure illustrates our class model: The following list explains the classes from the above model:XMLReader implements the ContentHandler. It reads the XML file, creating the objects defined by the attribute type, and storing those objects in the I_XMLStorage.XMLGenericObject stores the element and attribute values during the reading process. After the reading process for a particular object is terminated, the XMLGenericObject is passed as an argument to the fromXML() method of an object implementing the I_XMLReadable interface.I_XMLReadable is implemented by any object that is read from an XML file. It has two methods: fromXML() and onEndDocument(). fromXML() is called just after the values for that object are read. It receives as arguments an object of the type XMLGenericObject and an object of the type I_XMLStorage. The XMLGenericObject contains all the values needed for the object to be initialized. This reading process is comparable to the initializing of an object with a data record coming from a database. The process could be reused in the fromXML() method — the XMLGenericObject need only be transformed into an object of type required by the persistency framework for initializing an object from a database.The onEndDocument() method is called from the endDocument() method of the ContentHandler. You can use onEndDocument() to build the associations. It ensures the creation of object associations, regardless of the order in which the objects are read. (The exact sequence in which objects are written to XML cannot always be guaranteed).I_XMLStorage is where our XMLReader stores the created objects. Also, the user can store some objects in I_XMLStorage during the reading process. This lets the user retrieve any created object from XML after the reading process.I_ObjectFactory: Objects are created by a factory, or, when no factory is found, by reflection. Using a factory for object creation, the value of the attribute type has to be mapped to an I_ObjectFactory via a Map. After that, the Map has to be registered in our XMLReader. If the XMLReader cannot find a factory for a specific type, it tries to create the object by reflection. This is only possible if the full name of an object type is specified and a constructor with no arguments exists. (So the user must decide how to manage the object creation.)I_XMLReaderErrorHandler handles error-prone situations by giving the XMLReader user the flexibility to react adequately. Expressed in the pattern idiom, this is a Strategy pattern. The user must choose the right error-handling strategy for an XMLReader. Compared with the org.sax.ErrorHandler, which can be registered to the ContentHandler, our error handler doesn’t throw any checked exceptions. I don’t think it makes sense to throw checked exceptions in an error handler, because the methods in an error handler are already the handlers for checked exceptions. Since an exception is handled in the error handler, it really doesn’t make sense to throw a checked exception again. So, the error-handler method handling the situation either solves the problem or it doesn’t. If not, a RuntimeException has to be thrown signaling that an application error has really occurred!XMLSerializer: For XML serialization, a DOM object is built, because the Apache Xerces tool already supports serialization of a DOM object. The process of building a DOM object is encapsulated in the XMLSerializer. Since the XMLGenericObject is used in the reading process, it can also be used in the writing process. The easiest way to serialize an object is to transform it into an XMLGenericObject. It should have the same elements and attribute values that it had during the reading process. The root element name for that object should also be provided. Then the toXML() method can be called to serialize the object.Consolidation exampleNow I will walk you through the code for the complete XML component. First, it creates the factories. In our example, only one factory is created. The factory can create three different types of objects. This is expressed in the mapping of the object type to the factory object:I_ObjectFactory theFactory = new DataObjectFactory(); theFactories.put("test.person", theFactory); theFactories.put("test.communication", theFactory); theFactories.put("test.personList", theFactory); The code then creates the error handler, which handles error-prone situations during the reading process. The error handler’s objective is to replace checked exceptions: DefaultXMLReaderErrorHandler theErrorHandler = new DefaultXMLReaderErrorHandler(); Create the Storage: XMLStorage theXMLStorage = new XMLStorage(); Then it creates the XMLReader and initializes it with the factory and the error handler:XMLReader theXMLReader = new XMLReader(); theXMLReader. setXmlReaderErrorHandler(theErrorHandler); theXMLReader.setFactories(theFactories); It then reads two XML files through the XML reader:theXMLReader.parse(new java.io.FileInputStream(aFilename1), theXMLStorage); //observe, that the same I_XMLStorage can be used for different readings theXMLReader.parse(new java.io.FileInputStream(aFilename2), theXMLStorage); The following code shows how the fromXML() method can be implemented to initialize an object with the values from the XML file. The values have been collected into the XMLGenericObject object during the reading process: public void fromXML(XMLGenericObject aXMLGenericObject, XMLReader aXMLReader) { super.fromXML(aXMLGenericObject, aXMLReader); //setting the field values firstName = (String)aXMLGenericObject.getElementValue("firstName"); lastName = (String)aXMLGenericObject.getElementValue("lastName"); //retrieving of user objects ArrayList thePersons = (ArrayList)aXMLReader.getXmlStorage().getUserObjectFor("test.person"); if (thePersons == null){ thePersons = new java.util.ArrayList(); //add a user object to the XMLStorage aXMLReader.getXmlStorage(). addUserObject("test.person", thePersons, aXMLReader.getXmlReaderErrorHandler()); } thePersons.add(this); } Here it creates the XMLSerializer, a helper class to serialize an object to XML format://create XMLSerializer with the factory method and // pass the root element name of the document as an argument XMLSerializer theXMLSerializer = XMLSerializer.createSerializer("personList"); An object is then written to the XMLSerializer. This code shows that the values of an object are not written directly to the XMLSerializer object. Instead, an XMLGenericObject object is used; this encapsulates the behavior of writing the values to the XMLSerializer. In other words, the XMLSerializer provides a low-level interface and the XMLGenericObject provides a high-level interface:public void toXML(XMLSerializer aXMLSerializer) { XMLGenericObject theXMLGenericObject = new XMLGenericObject(); //set the root element name of the object theXMLGenericObject.setRootElementName("person"); //set the attributes typ and id theXMLGenericObject.setAttributeValue("type", "test.person"); theXMLGenericObject.setAttributeValue("id", getID()); //set the field values theXMLGenericObject.setElementValue("firstName", firstName); theXMLGenericObject.setElementValue("lastName", lastName); //use the XMLGenericObject to add the object to the XMLSerializer theXMLGenericObject.toXML(aXMLSerializer); } Finally, the code prints the XML result: theXMLSerializer.toXMLString() the result would be: <?xml version="1.0" encoding="iso-8859-1"?> <personList> <person id="1" type="test.person"> <firstName>Nick</firstName> <lastName>Kassem</lastName> </person> </personList> ConclusionYou have seen how applying some simple rules when mapping an object to XML simplified the implementation of our XMLReader. Finding rules that have no negative impact on a component’s functionality can be difficult. Before establishing the rules, it is best to ask yourself how you can simplify overly complex implementations. In the case of the XMLReader, I found it extremely difficult to read an XML file with embedded objects in a generic way. So I asked myself whether such complex XML structures were really required. Then I found out that any embedded object structure could be mapped in a flat XML structure, modeling the links with ids and refids. This is nothing new, but is very similar to the way an object model is mapped to a relational database model.Markus Dorn is the principal Java consultant at Marwin Consulting AG. He has a degree in computer science and is a Sun-certified Java programmer. Dorn is currently working on a distributed system development for an insurance company in Zurich, Switzerland. Web DevelopmentJavaProgramming Languages