Explore JAXB and Castor from the ground up Let’s face it; XML by itself is just another data format that is annoying to access from your Java programs. Don’t get me wrong; I appreciate XML for its portability, its separation of data and presentation, and its human and computer readability. However, I just can’t be bothered writing DOM (Document Object Model), SAX (Simple API for XML), or even JDOM code to programmatically work with XML data. I have better things to do with my precious programming time.What I need is a summer intern: someone who can get me lunch and do my dirty work for me — like writing Java classes that correspond to every XML document type I work with. These Java classes could turn XML documents into my program’s objects. Each XML tag would map to an object attribute, and a tag’s contents would be the attribute value. Then, for a real challenge, I’d ask my intern to provide a marshal method in each Java class.“Who’s Marshal?” the intern might ask. “A military officer who marshals things, arranges things in methodical order,” I’d explain smugly.Marshaling a Java object means converting it to XML format for storage or for sending. It’s like when you fold your socks to put them neatly away — which, come to think of it, is something else my intern could do. Later, when you or someone else wears those socks again, it’s like turning an XML document back into useable Java objects — unmarshaling. And you’ll agree that, when you wake up in the morning, it’s nice to find a basket full of neatly folded socks.This is the classic marshal and unmarshal diagram, but for fun, this one illustrates my socks analogy. The ability to work with unmarshaled XML documents would be great because I could use and maintain regular Java objects much more easily and naturally than I could with a bunch of XML parsing code. My unmarshaled Java objects could even validate attributes based on the original XML Schema constraints. This validation would include type checking and verifying range values. I would have a way to programmatically construct a Java object that could save itself as an XML document valid against a certain XML Schema. Now we’re talking! But what about performance? Using these unmarshaled objects, my application would be faster than SAX parsing and require less memory than DOM parsing.Do you want to work as my intern this summer? I know what you’re thinking: writing the marshal, unmarshal, and validating-accessor methods based on schema constraints would be a long and tedious task. In addition, every time I changed my XML Schema, you’d have to update this code. Doing my laundry wouldn’t be much fun either. Doesn’t matter; I don’t want to baby-sit and entertain an intern all summer anyway. I’ve got better things to do!What if I told you that there already are XML data-binding frameworks that can generate this type of marshaling and unmarshaling code for you? Just feed in a DTD (document type definition) or an XML Schema and — presto! — you have Java classes that can marshal, unmarshal, and check data constraints. And like many Java XML tools, these frameworks are mostly free. In this article, we’ll examine two such frameworks: Sun’s Java Architecture for XML Binding (JAXB) and Castor from the Exolab Group. XML data constraints: DTD versus XML SchemaA valuable XML concept is the ability to define your own XML vocabulary. An XML vocabulary is an industry-specific XML information model or document type that you define for XML data sharing. In other words, you define constraints that specify what a particular group of XML documents should always look like. Document creators, programmers, graphic designers, and database specialists use a constrained document type as the basis for creating compatible application pieces. This parallel collaboration around a document type is easy because everyone knows ahead of time what the constrained XML documents will look like.You can define an XML vocabulary by constraining XML in two different ways. The original method from the XML specification uses a DTD. The new and improved approach uses the recently formalized W3C (World Wide Web Consortium) Recommendation, XML Schema.For example, in this article, I will define an XML vocabulary for describing socks. I’m a sock expert, since I wear socks almost every day, and so I know exactly what information is needed to fully describe a sock collection. Every sock in my definition has the following descriptive properties: number, name, image, color, price, and smell. Thus, I can create the following DTD to formally describe a sock collection:Listing 1. socks.dtd<!ELEMENT socks ( sock* ) > <!ELEMENT sock (name, image, color, price, smell) > <!ATTLIST sock number CDATA #REQUIRED > <!ELEMENT name (#PCDATA) > <!ELEMENT image (#PCDATA) > <!ELEMENT color EMPTY > <!ELEMENT price (#PCDATA) > <!ELEMENT smell (#PCDATA) > <!ATTLIST color value (white|black) #REQUIRED > Listing 1 says simply that an XML document conforming to my socks.dtd constraints must have zero or more socks. Each sock has exactly one name, image, color, price, and smell — in that order. The color can only have the values white or black. Each sock has an attribute called number. Do you see why we say DTDs constrain conformant XML documents? (For more on DTDs, see Resources.) An XML document that is valid against this DTD — that is, one that follows the constraints correctly — might look like this:Listing 2. socks.xml<?xml version="1.0"?> <!DOCTYPE socks SYSTEM "socks.dtd"> <socks> <sock number="1"> <name>black socks</name> <image>blacksocks.jpg</image> <color value="black"/> <price>9.99</price> <smell>7</smell> </sock> <sock number="2"> <name>white socks</name> <image>whitesocks.jpg</image> <color value="white"/> <price>5.34</price> <smell>2</smell> </sock> <sock number="3"> <name>old white socks</name> <image>oldwhitesocks.jpg</image> <color value="white"/> <price>2.20</price> <smell>9</smell> </sock> </socks> DTDs have garnered some complaints, especially from programmers. The problem: DTDs really constrain only the document’s structure, not the data it contains. All the elements and attributes are strings, and you can’t specify allowed value ranges. The best data constraining you can do with a DTD is to require that attributes be strings from a constant list. Furthermore, DTDs are not in XML format themselves, so they don’t seem to fit in too well. The answer to these deficiencies: XML Schema. XML Schemas are in XML format; they allow data typing, user-defined types, and range value constraints. XML Schema’s popularity and software support is growing because it is now a final W3C Recommendation. Here you’ll find an example of constraining the same document type using XML Schema:Listing 3. socks.xsd<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="socks"> <xsd:complexType> <xsd:sequence> <xsd:element name="sock" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="image" type="imageType"/> <xsd:element name="color" type="colorType"/> <xsd:element name="price" type="money"/> <xsd:element name="smell" type="smellType"/> </xsd:sequence> <xsd:attribute name="number" type="xsd:string" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:simpleType name="imageType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="(.)+.(gif|jpg|jpeg|bmp)"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="colorType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="black"/> <xsd:enumeration value="white"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="money"> <xsd:restriction base="xsd:decimal"> <xsd:fractionDigits value="2"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="smellType"> <xsd:annotation> <xsd:documentation>0=clean and 10=smells terrible</xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:nonNegativeInteger"> <xsd:minInclusive value="0"/> <xsd:maxInclusive value="10"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> That looks a little more complicated, but believe me, the complexity is worth it. In addition to the constraints we specified in the DTD, XML Schema lets us add the following: The contents of the <image> tag must end with an image type extension (like .gif).The <price> must be a number with two fractional digits (like 5.34).The <smell> must be an int between 0 and 10. There is also a documentation comment stating 0=clean and 10=smells terrible.For more on XML Schemas, see the schema tutorial in Resources.To associate socks.xml with our XML Schema, we’ll change only some attributes in the root (socks) tag:Listing 4. socks.xml for use with the XML Schema <?xml version="1.0"?> <socks xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="socks.xsd"> <sock number="1"> <name>black socks</name> ... How do we check to see if an XML document is valid against a DTD or XML Schema? We use an XML parser like Apache Xerces to verify that an XML document conforms to the socks.dtd or socks.xsd constraints. I’ve provided a Windows batch file in my sample code called validate.bat to do that:validate.bat <path_to_socks>socks.xmlThis invokes the Xerces parser in validation mode, which asks it to please check socks.xml against its stated document type. Xerces now supports both DTD and XML Schema validation. I’m a teacher, so I always like to include programming exercises. As an exercise, run validate on socks.xml, change socks.xml to make it invalid, and run validate again. Xerces should produce helpful error messages.Onward to the data-binding frameworks!Now we’ll use JAXB and Castor to generate Java classes based on DTDs and XML Schemas. Again, schema-based code generation enables us to represent conformant XML documents as objects in our programs. JAXB and Castor perform essentially the same task, but we’ll see that Castor is a more mature and full-featured package. Two other similar, young frameworks worth noting are Enhydra Zeus and Arborealis from Beautiful Code BV, but we won’t examine them here (see Resources for more information).Use JAXB to turn your socks blackLet’s first look at the Sun framework for XML data binding, JAXB. The JAXB API automates the mapping between XML documents and Java objects. It is currently an early access release, meaning that we can download and use a bare-bones working version. JAXB now supports only Java class creation from DTDs; future releases will add XML Schema support. Let’s begin our tour of generating classes with JAXB based on socks.dtd with the following summary of steps:Start with our DTD, socks.dtd, and define a JAXB binding schemaInvoke the JAXB schema compilerCompile our newly generated classesExamine a test program using these classes; the test program follows these steps: Unmarshal an existing XML documentChange the content treeValidateMarshalCompile and run the test programStep 1. Start with our DTD, socks.dtd, and define a JAXB binding schemaTo begin, we’ll need to write one more thing: a JAXB binding schema — that is, a JAXB-specific document that helps JAXB convert the DTD to Java classes. It makes up for the DTD data typing deficiencies and separates the programming-specific information from the schema information. For our example, it looks like this: Listing 5. socks.xjs<?xml version="1.0"?> <xml-java-binding-schema version="1.0ea"> <!-- Register a type. This specifies that we want to use this type instead of String somewhere in our document. --> <conversion name="BigDecimal" type="java.math.BigDecimal" /> <element name="socks" type="class" root="true" /> <element name="price" type="value" convert="BigDecimal"/> <!-- To restrict the sock color to white or black we create an enumeration with the allowed values and make the color attribute the new enumeration type --> <element name="color" type="class"> <attribute name="value" convert="SockColor"/> </element> <enumeration name="SockColor" members="white black"/> </xml-java-binding-schema> As noted in bold font, we ask JAXB to make the price element a BigDecimal, which is like a Java double. Our binding schema also restricts the color values to white or black. Yeah, I know, it’s tough to learn yet another syntax for this binding schema, but Sun specifications have a lot of staying power, so the spec should be around for a while. Besides, maybe Sun will make a GUI to help in creating the JAXB binding schema (fingers crossed). For more information about JAXB binding schemas, see Resources.Step 2. Invoke the JAXB schema compiler Now we invoke the JAXB schema compiler. First, we need to download JAXB. We surf to https://java.sun.com/xml/jaxb/ and download the JAXB Implementation 1.0, Early Access Release.We will end up with the following jar files:jaxb-rt-1.0-ea.jar: Runtime library (the binding framework)jaxb-xjc-1.0-ea.jar: Schema compilerNext, we invoke the JAXB schema compiler:java -jar %JAXB_HOME%libjaxb-xjc-1.0-ea.jar socks.dtd socks.xjs -d destination_directory So, the DTD and the schema binding file go in, and Java classes come out. The Java files produced are Socks.java, SockColor.java, Sock.java, and Color.java. Examine the files and take pleasure in knowing that you won’t have to maintain this code yourself. When your DTD or schema binding changes, you simply regenerate the classes. Notice in particular the validate(), marshal(), and unmarshal() methods. If you’re thinking about adding custom code, you’d better subclass these generated classes. That way you will end up regenerating the superclasses, which won’t overwrite your custom code. I’ll leave that as another exercise for you.Step 3. Compile our newly generated classesWe ensure the JAXB runtime jar (jaxb-rt-1.0-ea.jar) is in our CLASSPATH, then we compile the generated classes.Step 4. Examine a test program using these classesOkay, deep breath. Now we’ll examine a test program that uses our newly generated classes.I recently moved from America to Europe with a wardrobe that included only white socks. I quickly discovered that Europeans only wear dark-colored socks. My program turns all my white socks black and leaves the smell alone because you don’t need to wash black socks:Listing 6. JAXBTestSocks.java import java.io.*; import java.util.*; import java.math.BigDecimal; public class JAXBSocksTest { public static void main(String[] args) { // Building the content trees (Unmarshal the socks.xml file) Socks socks = new Socks(); try { // Pass in socks.xml as a command-line argument File socksFile = new File(args[0]); InputStream fin = new FileInputStream(socksFile); socks = socks.unmarshal(fin); List sockList = socks.getSock(); // Print in memory document printSocks(sockList); fin.close(); // Turn all socks black Sock sock; Color black; // My Color class generated by JAXB System.out.println("Turn all socks black for my new European look!"); for (Iterator i = sockList.iterator(); i.hasNext();) { sock = (Sock)i.next(); if (sock.getColor().getValue().equals(SockColor.WHITE)) { // if you don't make a new Color object every time JAXB // gets confused! black = new Color(); black.setValue(SockColor.BLACK); sock.setColor(black); } } // Create a new sock (I only buy black ones now!) sock = new Sock(); sock.setNumber("4"); sock.setName("new sock"); sock.setImage("newsock.jpg"); Color c = new Color(); c.setValue(SockColor.BLACK); sock.setColor(c); sock.setSmell("0"); sock.setPrice(new BigDecimal("5.55")); sockList.add(sock); // Print in memory document printSocks(sockList); // Validate. You must do this before marshaling socks.validate(); //Marshal System.out.println("Marshaling socks to file blackSocks.xml"); File socks_new = new File("blackSocks.xml"); FileOutputStream fout = new FileOutputStream(socks_new); socks.marshal(fout); fout.close(); } catch (Exception e) { e.printStackTrace(); } } //Helper method to print out the socks static void printSocks(List sockList) { System.out.println("Printing in memory socks..."); Sock currentSock; for (Iterator i = sockList.iterator(); i.hasNext();) { currentSock = (Sock)i.next(); System.out.println("Sock number " + currentSock.getNumber() + " | Name: " + currentSock.getName() + " | Color: " + currentSock.getColor() + " | Price: " + currentSock.getPrice() + " | Smell: " + currentSock.getSmell()); } System.out.println(""); } } Steps 4a and 4b. Unmarshal an existing XML document and change the content tree There are two ways to build a content tree — that is, an in-memory representation of your XML document as objects. One approach is to unmarshal an XML document, which I do using the unmarshal() method. I can then extract the Sock objects as a List and modify any attributes I’d like. Here I’m turning all my white socks black. The other way to add content is to simply use the JAXB-generated classes to construct new Sock objects. I insert a new Sock object into my SockList, and I have a new Sock in my in-memory content tree.Step 4c. ValidateTo marshal my Socks content tree (i.e., turn it back into XML), I must first call the validate() method on my Socks object. This validation is only necessary if I’ve changed the originally unmarshaled document tree. Validation ensures that my Socks object content tree conforms to my DTD constraints.Step 4d. MarshalMarshaling is easy: I just create a FileOutputStream and pass it to the Socks object’s marshal() method. See, I told you — simple! I end up with an XML document on disk containing all black socks plus an extra sock. I’m finally starting to fit in here in Europe.Step 5. Compile and run the test programWe make sure our compiled JAXB-generated classes and the JAXB runtime jar are still in our CLASSPATH, and then compile and run. Here is the output:Listing 7. JAXBTestSocks OutputPrinting in memory socks... Sock number 1 | Name: black socks | Color: <<color value=black>> | Price: 9.99 | Smell: 7 Sock number 2 | Name: white socks | Color: <<color value=white>> | Price: 5.34 | Smell: 2 Sock number 3 | Name: old white socks | Color: <<color value=white>> | Price: 2.20 | Smell: 9 Turn all socks black for my new European look! Printing in memory socks... Sock number 1 | Name: black socks | Color: <<color value=black>> | Price: 9.99 | Smell: 7 Sock number 2 | Name: white socks | Color: <<color value=black>> | Price: 5.34 | Smell: 2 Sock number 3 | Name: old white socks | Color: <<color value=black>> | Price: 2.20 | Smell: 9 Sock number 4 | Name: new sock | Color: <<color value=black>> | Price: 5.55 | Smell: 0 Marshaling socks to file blackSocks.xml For a full code listing and detailed instructions on running this example, download my code from Resources.Castor: A more mature frameworkWe should examine a more robust framework, since JAXB is still in such early stages. Castor, an open source offering from the Exolab Group, is based on the original JAXB specification. Castor is much more mature and is a good choice if you need XML data binding now.Castor does everything JAXB does and more. You can use either a DTD or an XML Schema for your constraints. Furthermore, you don’t need an extra binding schema document to help Castor’s schema compiler. It is able to gather enough information from the XML Schema constraints alone. Sun claims that requiring an extra binding schema separates programmer info from general XML Schema info; this separation provides more flexibility. Castor allows an optional mapping file for those types of extra customizations. On top of all this, the Castor project also does object relational mapping, and Castor is working on a direct SQL-XML bridge that does not involve any Java objects.Let’s use Castor to produce a similar program that manipulates my socks. We’ll follow these steps:Start with our XML Schema, socks.xsd; no extra binding schema is necessaryInvoke the Castor schema compiler to generate Java classesCompile our newly generated classesExamine a test program similar to the last one using these classes; the test program follows these steps: UnmarshalChange the content treeValidateMarshalCompile and run the test programStep 1. Start with our XML Schema, socks.xsdWe’ve already created our XML Schema, socks.xsd, so that’s all for Step 1!Step 2. Invoke the Castor schema compiler to generate Java classesFirst, we need to set the CLASSPATH to include:castor-0.9.3-xml.jar: Castor classesxerces.jar: Apache Xerces2 XML parserjakarta-regexp-1.2.jar: Jakarta regexp packageNext, we run the Castor schema compiler:java org.exolab.castor.builder.SourceGenerator -i socks.xsd -dest destination_directory So, the XML Schema goes in, and Java classes come out. The Java files produced are Socks.java, SocksDescriptor, Sock.java, SockDescriptor.java, ColorType.java, and ColorTypeDescriptor.java.The marshaling framework uses the class descriptors to hold binding and validation information. We won’t work with these directly. Also notice that the ColorType classes are put into a package called types. Those classes are in the types package because we used an enumeration to restrict the possible color values to white and black. Castor makes custom types to handle validation with that sort of constraint.Look at the generated Java files. In particular, look for the marshal(), unmarshal(), and validate() methods in the main classes. In SockDescriptor.java, look for the regular expressions from socks.xsd, like "(.)+.(gif|jpg|jpeg|bmp)". Again, validation occurs in the Descriptor classes.The Castor source generator features many more options we haven’t discussed. See the Resources for further reading.Step 3. Compile our newly generated classesCompile the generated classes.Step 4. Examine a test program similar to the last one using these classesNow we’ll write a program to use our newly generated classes; it’s going to clean my socks. The program is a bit complex, but guaranteed to be more fun than real laundry. The strategy is to unmarshal a socks.xml document, extract the socks, set their smell to 0 (clean), and add a new sock:Listing 8. CastorTestSocks.javaimport java.io.*; import java.util.*; import java.math.BigDecimal; import org.exolab.castor.xml.*; import org.exolab.castor.xml.util.*; public class CastorSocksTest { public static void main(String[] args) { try { System.out.println("Unmarshaling Socks"); //pass in socks.xml as a command-line argument Socks socksDocument = Socks.unmarshal(new FileReader(args[0])); //Get all socks and print out their attributes Sock[] sockArray = socksDocument.getSock(); printSocks(sockArray); //Wash all socks (set smell to 0) System.out.println("Cleaning socks by setting smell to 0"); for (int x=0;x<sockArray.length;x++) { sockArray[x].setSmell(0); } printSocks(sockArray); //Marshal System.out.println("Marshaling socks to file cleanSocks.xml"); socksDocument.validate(); //make sure the sock is valid against the schema socksDocument.marshal(new FileWriter("cleanSocks.xml")); //Make a new sock System.out.println("Make a new sock"); Sock newSock = new Sock(); newSock.setNumber("4"); // Actually setting an attribute! newSock.setName("New Sock"); //Castor created a package called types for the ColorType class newSock.setColor(types.ColorType.BLACK); newSock.setPrice(new BigDecimal("3.33")); newSock.setSmell(2); // I think new sock smell is a 2 newSock.setImage("newsock.jpg"); //add new sock to in memory Socks object (unmarshaled XML doc) socksDocument.addSock(newSock); //Ask it for a new array with all contained socks Sock[] sockArray2 = socksDocument.getSock(); printSocks(sockArray2); System.out.println("Marshaling socks to file socksPlusNewSock.xml"); socksDocument.validate(); //make sure the sock is valid against the schema socksDocument.marshal(new FileWriter("socksPlusNewSock.xml")); } catch (ValidationException ve) { System.out.println(ve.getMessage()); } catch (Exception e) { e.printStackTrace(); } } //Helper method to print out the socks static void printSocks(Sock[] sockArray) { System.out.println("Printing in memory socks..."); Sock currentSock; for (int x=0;x<sockArray.length;x++) { currentSock=sockArray[x]; System.out.println("Sock number " + currentSock.getNumber() + " | Name: " + currentSock.getName() + " | Color: " + currentSock.getColor() + " | Price: " + currentSock.getPrice() + " | Smell: " + currentSock.getSmell()); } System.out.println(""); } } Steps 4a and 4b. Unmarshal and change the content treeThis goes somewhat similarly to our JAXBSocksTest. Notice the differences, particularly the use of a static unmarshal method. Also notice we’re working with Arrays here instead of Lists.Steps 4c and 4d. Validate and marshalThis code also resembles JAXBSocksTest.Step 5. Compile and run the test programCompile and run the test program.Here is the output:Listing 9. CastorTestSocks OutputUnmarshaling Socks Printing in memory socks... Sock number 1 | Name: black socks | Color: black | Price: 9.99 | Smell: 7 Sock number 2 | Name: white socks | Color: white | Price: 5.34 | Smell: 2 Sock number 3 | Name: old white socks | Color: white | Price: 2.20 | Smell: 9 Cleaning socks by setting smell to 0 Printing in memory socks... Sock number 1 | Name: black socks | Color: black | Price: 9.99 | Smell: 0 Sock number 2 | Name: white socks | Color: white | Price: 5.34 | Smell: 0 Sock number 3 | Name: old white socks | Color: white | Price: 2.20 | Smell: 0 Marshaling socks to file cleanSocks.xml Make a new sock Printing in memory socks... Sock number 1 | Name: black socks | Color: black | Price: 9.99 | Smell: 0 Sock number 2 | Name: white socks | Color: white | Price: 5.34 | Smell: 0 Sock number 3 | Name: old white socks | Color: white | Price: 2.20 | Smell: 0 Sock number 4 | Name: New Sock | Color: black | Price: 3.33 | Smell: 2 Marshaling socks to file socksPlusNewSock.xml Clean socks saved to disk. What a relief!For a full code listing and detailed instructions on running this example, download my code from Resources.Achieve faster, more maintainable codeJAXB and Castor are both frameworks that automatically construct a bridge between Java programs and XML documents. Programmers can use JAXB or Castor to represent constrained XML documents as Java objects. With this automatic access to XML data, you need to write only the applications that will actually use the XML data — and no code to retrieve, validate, or save the data. Since you specify data validation constraints in the DTD or XML Schema definition, and not in your Java code, your program is more maintainable. And because you use in-memory objects instead of slow XML parsing code, your program is inherently faster.That’s the scoop on JAXB and its older brother, Castor. Automating the Java class creation based on XML data constraints should save you some precious programming time and buy you some maintainability. Besides, you won’t have to hire a summer intern or do your own laundry this year! To make your Java programming life even easier, keep an eye on both JAXB and Castor for the next wave of exciting advances.Sam Brodkin is a Java technologist living in Rotterdam, the Netherlands. He operates Javacommand, a Java training, consultancy, and courseware company. Sam’s professional career began at Sun Microsystems, where he promoted server-side Java. Now he is focused on business-to-business Web applications with XML as the glue. Since Sam moved to Europe, he only wears black socks, and he still has to do his own laundry. JavaProgramming Languages