by Mark Johnson

Serialization and the JavaBeans Specification

how-to
Feb 1, 199819 mins

The trick to controlling and -- when necessary -- preventing serialization

In last month’s column, “Do it the ‘Nescafé way — with freeze-dried JavaBeans,” we discussed some of the reasons for, and applications of, freeze-drying JavaBeans into a persistent state. You will recall that serialization of an object is simply the encoding of its state (the values of its fields) in a structured way so that the object can be stored or transmitted as data and recreated at another place and time. (If you need an introduction to serialization in Java, see last month’s column. This month we’ll be diving right into coding examples, so you’ll want to be prepared.)

First, we’ll look at serialization of aggregate objects (not much of a feat, as you’ll see). We’ve got a quick example of how to implement the Externalizable interface (for you control freaks out there). Then, we’ll discuss how to keep sensitive information from being serialized at all. Finally, we’ll finish up with some enlightening reader feedback on last month’s column.

Serializing object structures

Last month, we saw that, for any object descended from java.lang.Object, you can make a class serializable simply by adding implements java.io.Serializable to the class definition, because class java.io.ObjectOutputStream knows how to serialize any class descended from java.lang.Object (which means any class at all).

But what if your object contains references to other objects or is composed of other objects? No problem! The serialization mechanism automatically detects references to other objects. As long as the “sub-objects” are also serializable, ObjectOutputStream serializes them and includes them in the stream.

Let’s look at a concrete example of this. In the following code example, we implement a TreeNode object. This object has internal fields of sToken_ (a string) and iType_ and iValue_ (integers). It also contains references to two other objects, tnLeft_ and tnRight_, which are references to the node’s left and right subtrees. (This node class could be extended easily for use in an expression evaluator.)

import java.io.*;
import java.lang.*;
// This is boring, but it gets the point across.
public class TreeNode
    extends java.lang.Object
    implements java.io.Serializable {
    protected int iType_;
    protected int iValue_;
    protected String sToken_ = new String("");
    protected TreeNode tnLeft_ = null;
    protected TreeNode tnRight_ = null;
    // Necessary to be a well-behaved bean.
    public TreeNode()
    {
        iType_ = iValue_ = -1;
    }
    // Explicit constructor
    public TreeNode(int iType, int iValue, String sToken,
                    TreeNode tnLeft, TreeNode tnRight)
    {
        iType_ = iType;
        iValue_ = iValue;
        sToken_ = sToken;
        tnLeft_ = tnLeft;
        tnRight_ = tnRight;
    }
    // Print me (indented) and all of my children
    public void print(String sIndent)
    {
        System.out.println(sIndent + "type:  " + iType_);
        System.out.println(sIndent + "value: " + iValue_);
        System.out.println(sIndent + "token: " + sToken_);
        System.out.println(sIndent + "left:");
        if (tnLeft_ != null) {
            tnLeft_.print(sIndent + "    ");
        } else {
            System.out.println(sIndent + "    (null)");
        }
        System.out.println(sIndent + "right:");
        if (tnRight_ != null) {
            tnRight_.print(sIndent + "    ");
        } else {
            System.out.println(sIndent + "    (null)");
        }
    }
    // Property accessors
    public void setToken(String sToken) { sToken_ = sToken; }
    public String getToken() { return sToken_; }
    public void setType(int iType) { iType_ = iType; }
    public int getType() { return iType_; }
    public void setValue(int iValue) { iValue_ = iValue; }
    public int getValue() { return iValue_; }
    public void setLeft(TreeNode tnLeft) { tnLeft_ = tnLeft; }
    public TreeNode getLeft() { return tnLeft_; }
    public void setRight(TreeNode tnRight) { tnRight_ = tnRight; }
    public TreeNode getRight() { return tnRight_; }
};

A TestNode is created with token, type, and value, and is connected to left and right branches at construction time. The property accessors allow us to set and interrogate the properties, including the left and right branches. (The BeanBox won’t show the branches as properties, since there’s no PropertyEditor for them. For more on the BeanBox, see “The BeanBox: Sun’s JavaBeans test container.”)

Our test class creates a recursive tree structure of TreeNodes and writes it to a file. Here’s the source for the test class, followed by a diagram of the structure it creates and serializes:

001 import java.io.*;
002 import java.beans.*;
003 import TreeNode;
004 
005 public class StreamDemo {
006 
007     private static void Usage() throws java.io.IOException
008     {
009         System.out.println("Usage:ntStreamDemo w filentStreamDemo r file");
010 
011         IOException ex = new IOException("ERROR");
012         throw ex;
013     }
014 
015     public static void main(String[] args)
016     {
017         System.out.println(args.length);
018 
019         try {
020             if (args.length <= 0)
021             {
022                 Usage();
023             }
024 
025             String cmd = args[0];
026 
027             if (cmd.compareTo("w") == 0)
028             {
029                 if (args.length != 2)
030                 {
031                     Usage();        // Unix anyone?
032                 }
033 
034                 TreeNode    tnLL = new TreeNode(4, 12, "Left Left",
035                                                     null, null);
036                 TreeNode    tnL = new TreeNode(2, 4, "Left", tnLL, null);
037                 TreeNode    tnR = new TreeNode(7, 9, "Right", null, null);
038                 TreeNode    tnRoot = new TreeNode(1, 2, "Root", tnL, tnR);
039 
040                 tnRoot.print("");
041 042                 FileOutputStream f = new FileOutputStream(args[1]);
043                 ObjectOutputStream s = new ObjectOutputStream(f);
044 
045                 s.writeObject(tnRoot); 
046
047                 s.flush();
048             }
049  
050             else if (cmd.compareTo("r") == 0)
051             {
052                 if (args.length != 2)
053                 {
054                     Usage();
055                 }
056 057                 FileInputStream f = new FileInputStream(args[1]);
058                 ObjectInputStream s = new ObjectInputStream(f);
059 
060                 System.out.println("Reading TreeNode:");
061 
062                 TreeNode tnRoot = (TreeNode) s.readObject(); 
063
064                 tnRoot.print("");
065             }
066 
067             else if (cmd.compareTo("i") == 0)
068             {
069                 if (args.length != 2)
070                 {
071                     Usage();
072                 }
073 074                 // Given a name, look for "name.ser"
075                 Object theBean = Beans.instantiate(null, args[1]);
076                 String sName = theBean.getClass().getName();
077 
078                 if ( sName.compareTo("TreeNode") == 0 )
079                 {
080                     TreeNode tn = (TreeNode)theBean;
081                     tn.print("");
082                 }
083                 else
084                 {
085                     System.err.println("There was a bean in that file, " +
086                     "but it was a " + sName);
087                 }
088             }
089 
090             else {
091                 System.err.println("Unknown command " + cmd);
092                 Usage();
093             }
094 
095         }
096 
097         catch (IOException ex) {
098             System.out.println("IO Exception:");
099             System.out.println(ex.getMessage());
100             ex.printStackTrace();
101         }
102         catch (ClassNotFoundException ex) {
103             System.out.println("ClassNotFound Exception:");
104             System.out.println(ex.getMessage());
105             ex.printStackTrace();
106         }
107     }
108 };

The tree created by this code looks like this:

The test program lets you exercise the TreeNode class in one of three ways. The code in red (lines 42-45) creates FileOutputStream f and then uses f to create an ObjectOutputStream, upon which we then invoke writeObject(). The serialization “machinery” inside the ObjectOutputStream analyzes the object that’s passed to it and serializes to the stream any fields it finds. If the ObjectOutputStream finds any non-null object references inside the TreeNode, it then calls writeObject recursively to serialize those objects, as well. In our sample case, it finds tnLeft_ and tnRight_ in each TreeNode, and serializes them if they’re non-null.

Now, the object serializer outputs only the fields, not the bytecodes, of an object. So how can the object run elsewhere if the bytecodes aren’t in the .ser file? When an object is created from its serialized representation, the Java virtual machine (JVM) creating the instance of the object must either “know” about the class (that is, the class must already be loaded into the JVM), or the JVM must know where to get the class definition (using a class loader). The methods java.beans.Beans.instantiate() and java.io.ObjectInputStream.readObject() take care of all of the class file loading for you, under the hood. (You can control the loading of classes, but just how to do so is beyond our scope here.)

The next piece of code, in blue (lines 57-62), shows how to recreate the TreeNode tree: Just call java.io.ObjectInputStream.readObject() and typecast the result to the class you’re expecting. Java’s typecasting is type-safe, so if you get something other than a TreeNode from readObject(), you’ll get an exception, and the deserialization will fail.

The final important code snippet above appears in green (lines 74-82), and uses the method java.beans.Beans.instantiate() to create the bean from the .ser file. This method is simply a higher-level interface to an ObjectInputStream. It lets you specify a class loader, so you have control over where your class files come from. Also, if the object that is loaded turns out to be an applet, this function initializes the applet by setting the applet’s initial size, creating a context for the applet to run in, and calling the applet’s init() method. See the documentation for java.beans.Beans.instantiate() for more on how this function works.

After all this explaining, the answer to the question “How do I make a complex structure of objects serializable?” is simple: Make sure every sub-object is serializable, and let Java handle the connections between the objects.

One final detail on serializing a complex structure: What if you had, say, a hundred references to the same object all throughout the structure? You might expect that the object would be serialized a hundred times, and when it was deserialized, you’d have a hundred instances of the same object in your structure, instead of just one. ObjectOutputStream is smarter than that, though. As it’s serializing, it keeps track of the identity of each object, and if it’s seen that object before, it inserts a special token into the output stream indicating which previously-seen object to use in that place. When ObjectInputStream receives one of these tokens, it hooks up the instance that’s already created instead of creating a new one. This process ensures that you always get exactly the same structure you had when the object was serialized.

Creating an Externalizable class

Often in Java documentation, you’ll see a requirement that a class “implement either the Serializable or the Externalizable interface.” There’s seldom a description of the Externalizable interface. (In fact, it’s not even very easy to find examples on the Internet of the Externalizable class being used in Java code.)

The method ObjectOuputStream.defaultWriteObject() serializes the object in a distinct series of steps, defined in the section on ObjectOutputStream in the Serialization Specification (http://java.sun.com/products/jdk/1.1/docs/guide/serialization/spec/output.doc.html). ObjectOutputStream.defaultWriteObject() first writes a description of the object’s class to the output stream so that the ObjectInputStream() that will recreate the object knows what kind of object to create. Then, defaultWriteObject() introspects the object to find out what its fields are. Next, defaultWriteObject() finds the “highest” (in the inheritance tree) serializable class of the object, and writes all of its fields to the stream. (I’m leaving out a couple of features here for simplicity.) Finally, defaultWriteObject goes down the inheritance tree, writing all of the fields for each derived subclass of that highest serializable class. This ensures that all fields of the object are written.

So, for example, if the object were an Ocelot, and its superclasses Animal and Mammal were serializable, defaultWriteObject would write all Serializable fields of Animal first, then of Mammal, and finally of Ocelot. (See the section Serial killers below for a description of serializable data fields.) defaultWriteObject writes any data fields that are of native types (String, int, and so on), using the members of interface java.io.DataOutput (which ObjectOutputStream implements), and any data fields that are objects by calling itself recursively on the object.

ObjectOutputStream, therefore, does all the work for you. But what if you want more control of the output format? What if, for security reasons, there are fields you don’t want written to the output stream? Or if the format of the file you’re writing is determined by some specification other than the Java Serialization Specification? Maybe it’s a document file for a word processor, or an OpenDoc object. In all of these situations, you may want complete control over how the objects are serialized. This is the purpose of the interface java.io.Externalizable.

java.io.Externalizable is actually a very simple interface, containing just two methods:

public abstract void writeExternal(ObjectOutput out)
   throws IOException
public abstract void readExternal(ObjectInput in)
   throws IOException, ClassNotFoundException

One function writes an object, the other reads it. You write these functions to implement this interface. All of the methods of the ObjectOutput interface are available to you for writing native types. You also become responsible for saving all information about the class and its superclasses — or, you gain control of the format of all of the information for the class and all of its superclasses, depending on how you want to look at it. The Externalizable interface specification also requires that the object implement a public or protected no-argument constructor. The “container” (in this case, the ObjectOutputStream) writes class information to the stream, identifying the object type. Reading/writing the object is deferred entirely to the two functions defined in the Externalizable interface.

Now, you’ll notice that the writeExternal() and readExternal() interfaces accept ObjectOutput and ObjectInput objects as arguments. Since ObjectOutputStream is an ObjectOutput (that is, it implements the ObjectOutput interface), you can pass an ObjectOutputStream to the writeExternal() method of an externalizable object, and it will dutifully write itself to that stream. In fact, in the example below, we create an ObjectOutputStream that is built from a FileOutputStream. Basically, we can serialize to a file. Check out the code below:

001 import java.io.*;
002 import java.lang.*;
003 
004 public class SimpleExternal implements java.io.Externalizable {
005   int iInt_ = 0;
006   String sString_ = new String("");
007 
008   // Note that default no-argument constructor is mandatory.
009   public SimpleExternal()
010   {
011   }
012 
013   public SimpleExternal(int iInt, String sString)
014   {
015     iInt_ = iInt;
016     sString_ = sString;
017   }
018 
019   // Write the custom external representation of the object
020   public void writeExternal(ObjectOutput out) throws java.io.IOException
021   {
022     int  i;
023     Integer theInt = new Integer(iInt_);
024   
025     // Write integer alone on a line
026     out.write("EXTERNALrn".getBytes());
027     out.write(theInt.toString().getBytes());
028     out.write("rn".getBytes());
029 
030     // Write string as bytes
031     out.write(sString_.getBytes());
032 
033     // Write "end-of-string" marker
034     out.write("rnEND_EXTERNALrn".getBytes());
035 
036   }
037 
038   // Read the object in its external format
039   public void readExternal(ObjectInput in) throws java.io.IOException
040   {
041 042     // Skip "EXTERNAL"
043     String  sLine = in.readLine();
044 
045     iInt_ = Integer.parseInt(in.readLine());
046 
047     sString_ = in.readLine();
048 
049     // Skip "END_EXTERNAL"
050     sLine = in.readLine();
051 
052   }
053 
054   // Print out object in semi-English
055   public void print()
056   {
057       System.out.println("Integer: " + iInt_ + "nString: " + sString_);
058   }
059 
060   // Accessors
061   public void setString(String sString) { sString_ = sString;}
062   public String getString() { return sString_;}
063   public void setInt(int i) { iInt_ = i;}
064   public int getInt() { return iInt_;}
065 };

You’ll see that in the writeExternal() method (above in red, lines 25-34), we save information about the fields (Int and String) that we want to save. In the readExternal() method (in blue, lines 41-50), we read those fields back in, in the same order as they were written. In the input method readExternal(), we can make assumptions about what the input looks like, because we assume what we’re reading was written by the corresponding writeExternal() method. (Of course, that’s no excuse for not doing proper error checking. I’ve left that out here to simplify the example.) We’ve taken complete control of how the objects were written. Now, let’s look at a dull-yet-enlightening example program that exercises our useless externalizable object.

001 import java.io.*;
002 import SimpleExternal;
003 
004 public class Demo7a {
005 
006     private static void Usage() throws java.io.IOException
007     {
008         System.out.println("Usage:ntDemo7a w file int stringntDemo7a r file");
009         IOException ex = new IOException("ERROR");
010         throw ex;
011     }
012 
013     public static void main(String[] args)
014     {
015         String cmd = args[0];
016 
017         try {
018             if (cmd.compareTo("w") == 0)
019             {
020                 if (args.length != 4)
021                 {
022                     Usage();        // UNIX anyone?
023                 }
024 
025                 int aa = Integer.parseInt(args[2]);
026                 String ss = args[3];
027 028                 SimpleExternal  bar = new SimpleExternal(aa, ss);
029                 FileOutputStream f = new FileOutputStream(args[1]);
030                 ObjectOutputStream s = new ObjectOutputStream(f);
031             
032                 s.writeObject(bar);
033                 s.flush();
034             }
035 
036             else if (cmd.compareTo("r") == 0)
037             {
038                 if (args.length != 2)
039                 {
040                     Usage();
041                 }
042 043                 FileInputStream f = new FileInputStream(args[1]);
044                 ObjectInputStream s = new ObjectInputStream(f);
045 
046                 System.out.println("Read SimpleExternal:");
047 
048                 SimpleExternal bar = (SimpleExternal) s.readObject();
049                 bar.print();
050             }
051 
052             else {
053                 System.err.println("Unknown command " + cmd);
054                 Usage();
055             }
056         }
057 
058         catch (IOException ex) {
059             System.out.println("IO Exception:");
060             System.out.println(ex.getMessage());
061             ex.printStackTrace();
062         }
063         catch (ClassNotFoundException ex) {
064             System.out.println("ClassNotFound Exception:");
065             System.out.println(ex.getMessage());
066             ex.printStackTrace();
067         }
068     }
069 };

Above, in red (lines 28-33), you see the code that first creates a SimpleExternal object, then opens a FileOutputStream, and associates that stream with an ObjectOutputStream. Finally, it tells the ObjectOutputStream to write the object by calling ObjectOutputStream.writeExternal(). (ObjectOutputStream.writeObject() checks to see if an object it receives implements Externalizable and, if so, hands over all externalizing to the class’s writeExternal() function.)

The input code appears in blue (lines 43-49). This time, we create a FileInputStream and associate it with an ObjectInputStream. The ObjectOutputStream function reads the first (in this case, the only) object from the file using SimpleExternal.readExternal(). The result is the original object, reconstituted from an “external” format.

Let’s run the program and see how it works:

C:>java Demo7a w sample.ser 12 ThisIsASampleString
C:>java Demo7a r sample.ser
Read SimpleExternal:
Integer: 12
String: ThisIsASampleString

C:>java Demo7a w sample.ser 215 TestingUnoDosTres
C:>type sample.ser
_ sr SimpleExternal7.  xpEXTERNAL
215
TestingUnoDosTres
END_EXTERNAL

C:>java Demo7a r sample.serRead SimpleExternal:
Integer: 215
String: TestingUnoDosTres

Lurking in among the binary numbers is the number 215 and the string TestingUnoDosTres, surrounded by EXTERNAL and END_EXTERNAL. This string is the object, serialized in the external format that we defined for the class. You can see that the main program we wrote works just fine. Cool! But what’s all that other trash? You’ll remember that ObjectOutputStream writes a description of the class to the stream; well, that’s what you’re seeing. That extra class information tells the ObjectInputStream what kind of object is coming next in the stream, so the class loader can find the bytecodes for the class. The ObjectInputStream creates the SimpleExternal object, then tells the SimpleExternal to “fill” itself from the stream by calling the new object’s readExternal() method.

If you don’t like this “interference” from ObjectOutputStream, you can write your own class of streamed objects by implementing your own types of ObjectOutput and ObjectInput, and then passing instances of these objects to the writeExternal or readExternal methods of your initial class. Of course, you’ll have to find some way of identifying the object type you want when you stream the object in, since ObjectInputStream will no longer be doing that for you.

Serial killers: How to avoid unwanted serialization

There’s a danger in providing writeExternal and readExternal methods: You can change an object’s state from the outside, or read an object without calling the accessor methods. It’s the component designer’s job to ensure that sensitive information is protected from serialization or externalization. This section offers a couple of relatively easy ways to do just that.

The first and easiest way to protect a field from being serialized is to simply not make it serializable. If you leave the keywords implements Serializable off of the class definition, the class won’t be serializable, and so serialization (obviously) will be prevented.

Sometimes, though, you may want to subclass a Serializable class but forbid the serialization of instances of that subclass. Since an object that inherits from a Serializable superclass is also serializable, you’ve got to find another way to prevent serialization. One cute trick is to implement a method that simply throws NotSerializableException, like this:

import java.io.*;
public class SecretObject007 extends SpyObject {
public void writeObject(Object obj) throws NotSerializableException
  {   throw new NotSerializableException("Not Serializable");
  }
  ...

Any class that tries to serialize this object will simply get an exception.

To protect a specific field from being serialized, mark it private transient (or make it static, if appropriate) since transient and static fields are never serialized. So, in a bean used to save account information for a user, you might see:

public class AccountBean implements Externalizable {
    private String sName;
    private String sUserID;
   private transient String sPassword;
...

The standard object serialization software would simply ignore “sPassword” and write the other fields to the output stream as expected. You can still serialize the object, but the password won’t show up in the output. There is a way to store an encrypted password. You could make sPassword transient as above, and then implement writeObject() so that it calls defaultWriteObject first, and then writes the encrypted password to the stream. Of course, readObject() would do the reverse.

Conclusion

We’ve gone a bit further into serialization this month, covering recursive serialization, the Externalizable interface, and preventing the serialization of sensitive information. Next month, we’ll finish up the topic of serialization and persistence with a discussion of serialized object versioning, and go over some problems with the current serialization mechanism.

Reader comments and clarifications

Last month, a few sophisticated readers wrote in and pointed out some issues that need clarification (and, in one case, a correction.)

First, thanks to the reader who requested that the articles not refer to pieces of code by color, since many readers print the articles on black-and-white printers. You’ll notice that this month, I’ve added line numbers to the code listings and references in the text.

In last month’s column, I said:

The interface java.io.Serializable specifies that a class that implements it contain two methods with the following signatures:

This was simply incorrect. In fact, the class need not implement these methods because, as is noted later in the article, the ObjectOutputStream class knows how to serialize any Java object. So, if ObjectOutputStream knows how to serialize any object, why does the Serializable interface exist at all?

Looking at the JDK source code for interface java.io.Serializable, we find something startling:

// ...lots of comments...
package java.io;
interface Serializable {
};

The Serializable interface has no methods at all! What is going on here? The documentation for interface java.io.Serializable states:

The serialization interface has no methods or fields and serves only to identify the semantics of being serializable.

In other words, a class implements Serializable solely for the purpose of indicating to other classes (in particular, java.io.ObjectOutputStream and java.io.ObjectInputStream) that it is indeed “willing” to be serialized. The designers of Java decided that it was safer to require programmers to explicitly declare a class to be serializable, rather than making classes serializable by default. This ensures that an object that manipulates sensitive information (security related objects and so forth) isn’t inadvertently serialized and transmitted to places it shouldn’t be.

So, where do the methods writeObject() and readObject() come from, if not from the Serializable interface? Classes such as ObjectOutputStream and ObjectInputStream, which serialize and deserialize other classes, by convention search for methods with precisely these signatures:

private void writeObject(java.io.ObjectOutputStream out)
     throws IOException;
 private void readObject(java.io.ObjectInputStream in)
     throws IOException, ClassNotFoundException;

The mere existence of these methods in a class is that class’s declaration that it knows how to serialize itself. ObjectOutputStream uses introspection to check for the existence of private void writeObject(java.io.ObjectOutputStream out). ObjectOutputStream uses the class’s writeObject method to serialize the object of the method exists, and uses defaultWriteObject() otherwise.

Many thanks to the alert readers who wrote in and corrected me on this important point.

Mark Johnson has a B.S. in Computer and Electrical Engineering from Purdue University (1986). He is a fanatical devotee of the Design Pattern approach in object-oriented architecture, of software components in theory, and of JavaBeans in practice. Over the past several years, he worked for Kodak, Booz-Allen and Hamilton, and EDS in Mexico City, developing Oracle and Informix database applications for the Mexican Federal Electoral Institute and for Mexican Customs. He currently works as a designer and developer for Object Products in Fort Collins, CO.