The trick to controlling and -- when necessary -- preventing serialization In last month’s column, “Do it the ‘Nescafé way — with freeze-dried JavaBeans,” we discussed some of the reasons for, and applications of, freeze-drying JavaBeans into a persistent state. You will recall that serialization of an object is simply the encoding of its state (the values of its fields) in a structured way so that the object can be stored or transmitted as data and recreated at another place and time. (If you need an introduction to serialization in Java, see last month’s column. This month we’ll be diving right into coding examples, so you’ll want to be prepared.)First, we’ll look at serialization of aggregate objects (not much of a feat, as you’ll see). We’ve got a quick example of how to implement the Externalizable interface (for you control freaks out there). Then, we’ll discuss how to keep sensitive information from being serialized at all. Finally, we’ll finish up with some enlightening reader feedback on last month’s column.Serializing object structuresLast month, we saw that, for any object descended from java.lang.Object, you can make a class serializable simply by adding implements java.io.Serializable to the class definition, because class java.io.ObjectOutputStream knows how to serialize any class descended from java.lang.Object (which means any class at all). But what if your object contains references to other objects or is composed of other objects? No problem! The serialization mechanism automatically detects references to other objects. As long as the “sub-objects” are also serializable, ObjectOutputStream serializes them and includes them in the stream.Let’s look at a concrete example of this. In the following code example, we implement a TreeNode object. This object has internal fields of sToken_ (a string) and iType_ and iValue_ (integers). It also contains references to two other objects, tnLeft_ and tnRight_, which are references to the node’s left and right subtrees. (This node class could be extended easily for use in an expression evaluator.)import java.io.*; import java.lang.*; // This is boring, but it gets the point across. public class TreeNode extends java.lang.Object implements java.io.Serializable { protected int iType_; protected int iValue_; protected String sToken_ = new String(""); protected TreeNode tnLeft_ = null; protected TreeNode tnRight_ = null; // Necessary to be a well-behaved bean. public TreeNode() { iType_ = iValue_ = -1; } // Explicit constructor public TreeNode(int iType, int iValue, String sToken, TreeNode tnLeft, TreeNode tnRight) { iType_ = iType; iValue_ = iValue; sToken_ = sToken; tnLeft_ = tnLeft; tnRight_ = tnRight; } // Print me (indented) and all of my children public void print(String sIndent) { System.out.println(sIndent + "type: " + iType_); System.out.println(sIndent + "value: " + iValue_); System.out.println(sIndent + "token: " + sToken_); System.out.println(sIndent + "left:"); if (tnLeft_ != null) { tnLeft_.print(sIndent + " "); } else { System.out.println(sIndent + " (null)"); } System.out.println(sIndent + "right:"); if (tnRight_ != null) { tnRight_.print(sIndent + " "); } else { System.out.println(sIndent + " (null)"); } } // Property accessors public void setToken(String sToken) { sToken_ = sToken; } public String getToken() { return sToken_; } public void setType(int iType) { iType_ = iType; } public int getType() { return iType_; } public void setValue(int iValue) { iValue_ = iValue; } public int getValue() { return iValue_; } public void setLeft(TreeNode tnLeft) { tnLeft_ = tnLeft; } public TreeNode getLeft() { return tnLeft_; } public void setRight(TreeNode tnRight) { tnRight_ = tnRight; } public TreeNode getRight() { return tnRight_; } }; A TestNode is created with token, type, and value, and is connected to left and right branches at construction time. The property accessors allow us to set and interrogate the properties, including the left and right branches. (The BeanBox won’t show the branches as properties, since there’s no PropertyEditor for them. For more on the BeanBox, see “The BeanBox: Sun’s JavaBeans test container.”) Our test class creates a recursive tree structure of TreeNodes and writes it to a file. Here’s the source for the test class, followed by a diagram of the structure it creates and serializes:001 import java.io.*; 002 import java.beans.*; 003 import TreeNode; 004 005 public class StreamDemo { 006 007 private static void Usage() throws java.io.IOException 008 { 009 System.out.println("Usage:ntStreamDemo w filentStreamDemo r file"); 010 011 IOException ex = new IOException("ERROR"); 012 throw ex; 013 } 014 015 public static void main(String[] args) 016 { 017 System.out.println(args.length); 018 019 try { 020 if (args.length <= 0) 021 { 022 Usage(); 023 } 024 025 String cmd = args[0]; 026 027 if (cmd.compareTo("w") == 0) 028 { 029 if (args.length != 2) 030 { 031 Usage(); // Unix anyone? 032 } 033 034 TreeNode tnLL = new TreeNode(4, 12, "Left Left", 035 null, null); 036 TreeNode tnL = new TreeNode(2, 4, "Left", tnLL, null); 037 TreeNode tnR = new TreeNode(7, 9, "Right", null, null); 038 TreeNode tnRoot = new TreeNode(1, 2, "Root", tnL, tnR); 039 040 tnRoot.print(""); 041 042 FileOutputStream f = new FileOutputStream(args[1]); 043 ObjectOutputStream s = new ObjectOutputStream(f); 044 045 s.writeObject(tnRoot); 046 047 s.flush(); 048 } 049 050 else if (cmd.compareTo("r") == 0) 051 { 052 if (args.length != 2) 053 { 054 Usage(); 055 } 056 057 FileInputStream f = new FileInputStream(args[1]); 058 ObjectInputStream s = new ObjectInputStream(f); 059 060 System.out.println("Reading TreeNode:"); 061 062 TreeNode tnRoot = (TreeNode) s.readObject(); 063 064 tnRoot.print(""); 065 } 066 067 else if (cmd.compareTo("i") == 0) 068 { 069 if (args.length != 2) 070 { 071 Usage(); 072 } 073 074 // Given a name, look for "name.ser" 075 Object theBean = Beans.instantiate(null, args[1]); 076 String sName = theBean.getClass().getName(); 077 078 if ( sName.compareTo("TreeNode") == 0 ) 079 { 080 TreeNode tn = (TreeNode)theBean; 081 tn.print(""); 082 } 083 else 084 { 085 System.err.println("There was a bean in that file, " + 086 "but it was a " + sName); 087 } 088 } 089 090 else { 091 System.err.println("Unknown command " + cmd); 092 Usage(); 093 } 094 095 } 096 097 catch (IOException ex) { 098 System.out.println("IO Exception:"); 099 System.out.println(ex.getMessage()); 100 ex.printStackTrace(); 101 } 102 catch (ClassNotFoundException ex) { 103 System.out.println("ClassNotFound Exception:"); 104 System.out.println(ex.getMessage()); 105 ex.printStackTrace(); 106 } 107 } 108 }; The tree created by this code looks like this:The test program lets you exercise the TreeNode class in one of three ways. The code in red (lines 42-45) creates FileOutputStream f and then uses f to create an ObjectOutputStream, upon which we then invoke writeObject(). The serialization “machinery” inside the ObjectOutputStream analyzes the object that’s passed to it and serializes to the stream any fields it finds. If the ObjectOutputStream finds any non-null object references inside the TreeNode, it then calls writeObject recursively to serialize those objects, as well. In our sample case, it finds tnLeft_ and tnRight_ in each TreeNode, and serializes them if they’re non-null. Now, the object serializer outputs only the fields, not the bytecodes, of an object. So how can the object run elsewhere if the bytecodes aren’t in the .ser file? When an object is created from its serialized representation, the Java virtual machine (JVM) creating the instance of the object must either “know” about the class (that is, the class must already be loaded into the JVM), or the JVM must know where to get the class definition (using a class loader). The methods java.beans.Beans.instantiate() and java.io.ObjectInputStream.readObject() take care of all of the class file loading for you, under the hood. (You can control the loading of classes, but just how to do so is beyond our scope here.)The next piece of code, in blue (lines 57-62), shows how to recreate the TreeNode tree: Just call java.io.ObjectInputStream.readObject() and typecast the result to the class you’re expecting. Java’s typecasting is type-safe, so if you get something other than a TreeNode from readObject(), you’ll get an exception, and the deserialization will fail.The final important code snippet above appears in green (lines 74-82), and uses the method java.beans.Beans.instantiate() to create the bean from the .ser file. This method is simply a higher-level interface to an ObjectInputStream. It lets you specify a class loader, so you have control over where your class files come from. Also, if the object that is loaded turns out to be an applet, this function initializes the applet by setting the applet’s initial size, creating a context for the applet to run in, and calling the applet’s init() method. See the documentation for java.beans.Beans.instantiate() for more on how this function works. After all this explaining, the answer to the question “How do I make a complex structure of objects serializable?” is simple: Make sure every sub-object is serializable, and let Java handle the connections between the objects.One final detail on serializing a complex structure: What if you had, say, a hundred references to the same object all throughout the structure? You might expect that the object would be serialized a hundred times, and when it was deserialized, you’d have a hundred instances of the same object in your structure, instead of just one. ObjectOutputStream is smarter than that, though. As it’s serializing, it keeps track of the identity of each object, and if it’s seen that object before, it inserts a special token into the output stream indicating which previously-seen object to use in that place. When ObjectInputStream receives one of these tokens, it hooks up the instance that’s already created instead of creating a new one. This process ensures that you always get exactly the same structure you had when the object was serialized.Creating an Externalizable classOften in Java documentation, you’ll see a requirement that a class “implement either the Serializable or the Externalizable interface.” There’s seldom a description of the Externalizable interface. (In fact, it’s not even very easy to find examples on the Internet of the Externalizable class being used in Java code.) The method ObjectOuputStream.defaultWriteObject() serializes the object in a distinct series of steps, defined in the section on ObjectOutputStream in the Serialization Specification (http://java.sun.com/products/jdk/1.1/docs/guide/serialization/spec/output.doc.html). ObjectOutputStream.defaultWriteObject() first writes a description of the object’s class to the output stream so that the ObjectInputStream() that will recreate the object knows what kind of object to create. Then, defaultWriteObject() introspects the object to find out what its fields are. Next, defaultWriteObject() finds the “highest” (in the inheritance tree) serializable class of the object, and writes all of its fields to the stream. (I’m leaving out a couple of features here for simplicity.) Finally, defaultWriteObject goes down the inheritance tree, writing all of the fields for each derived subclass of that highest serializable class. This ensures that all fields of the object are written.So, for example, if the object were an Ocelot, and its superclasses Animal and Mammal were serializable, defaultWriteObject would write all Serializable fields of Animal first, then of Mammal, and finally of Ocelot. (See the section Serial killers below for a description of serializable data fields.) defaultWriteObject writes any data fields that are of native types (String, int, and so on), using the members of interface java.io.DataOutput (which ObjectOutputStream implements), and any data fields that are objects by calling itself recursively on the object.ObjectOutputStream, therefore, does all the work for you. But what if you want more control of the output format? What if, for security reasons, there are fields you don’t want written to the output stream? Or if the format of the file you’re writing is determined by some specification other than the Java Serialization Specification? Maybe it’s a document file for a word processor, or an OpenDoc object. In all of these situations, you may want complete control over how the objects are serialized. This is the purpose of the interface java.io.Externalizable. java.io.Externalizable is actually a very simple interface, containing just two methods:public abstract void writeExternal(ObjectOutput out) throws IOException public abstract void readExternal(ObjectInput in) throws IOException, ClassNotFoundException One function writes an object, the other reads it. You write these functions to implement this interface. All of the methods of the ObjectOutput interface are available to you for writing native types. You also become responsible for saving all information about the class and its superclasses — or, you gain control of the format of all of the information for the class and all of its superclasses, depending on how you want to look at it. The Externalizable interface specification also requires that the object implement a public or protected no-argument constructor. The “container” (in this case, the ObjectOutputStream) writes class information to the stream, identifying the object type. Reading/writing the object is deferred entirely to the two functions defined in the Externalizable interface.Now, you’ll notice that the writeExternal() and readExternal() interfaces accept ObjectOutput and ObjectInput objects as arguments. Since ObjectOutputStream is an ObjectOutput (that is, it implements the ObjectOutput interface), you can pass an ObjectOutputStream to the writeExternal() method of an externalizable object, and it will dutifully write itself to that stream. In fact, in the example below, we create an ObjectOutputStream that is built from a FileOutputStream. Basically, we can serialize to a file. Check out the code below: 001 import java.io.*; 002 import java.lang.*; 003 004 public class SimpleExternal implements java.io.Externalizable { 005 int iInt_ = 0; 006 String sString_ = new String(""); 007 008 // Note that default no-argument constructor is mandatory. 009 public SimpleExternal() 010 { 011 } 012 013 public SimpleExternal(int iInt, String sString) 014 { 015 iInt_ = iInt; 016 sString_ = sString; 017 } 018 019 // Write the custom external representation of the object 020 public void writeExternal(ObjectOutput out) throws java.io.IOException 021 { 022 int i; 023 Integer theInt = new Integer(iInt_); 024 025 // Write integer alone on a line 026 out.write("EXTERNALrn".getBytes()); 027 out.write(theInt.toString().getBytes()); 028 out.write("rn".getBytes()); 029 030 // Write string as bytes 031 out.write(sString_.getBytes()); 032 033 // Write "end-of-string" marker 034 out.write("rnEND_EXTERNALrn".getBytes()); 035 036 } 037 038 // Read the object in its external format 039 public void readExternal(ObjectInput in) throws java.io.IOException 040 { 041 042 // Skip "EXTERNAL" 043 String sLine = in.readLine(); 044 045 iInt_ = Integer.parseInt(in.readLine()); 046 047 sString_ = in.readLine(); 048 049 // Skip "END_EXTERNAL" 050 sLine = in.readLine(); 051 052 } 053 054 // Print out object in semi-English 055 public void print() 056 { 057 System.out.println("Integer: " + iInt_ + "nString: " + sString_); 058 } 059 060 // Accessors 061 public void setString(String sString) { sString_ = sString;} 062 public String getString() { return sString_;} 063 public void setInt(int i) { iInt_ = i;} 064 public int getInt() { return iInt_;} 065 }; You’ll see that in the writeExternal() method (above in red, lines 25-34), we save information about the fields (Int and String) that we want to save. In the readExternal() method (in blue, lines 41-50), we read those fields back in, in the same order as they were written. In the input method readExternal(), we can make assumptions about what the input looks like, because we assume what we’re reading was written by the corresponding writeExternal() method. (Of course, that’s no excuse for not doing proper error checking. I’ve left that out here to simplify the example.) We’ve taken complete control of how the objects were written. Now, let’s look at a dull-yet-enlightening example program that exercises our useless externalizable object.001 import java.io.*; 002 import SimpleExternal; 003 004 public class Demo7a { 005 006 private static void Usage() throws java.io.IOException 007 { 008 System.out.println("Usage:ntDemo7a w file int stringntDemo7a r file"); 009 IOException ex = new IOException("ERROR"); 010 throw ex; 011 } 012 013 public static void main(String[] args) 014 { 015 String cmd = args[0]; 016 017 try { 018 if (cmd.compareTo("w") == 0) 019 { 020 if (args.length != 4) 021 { 022 Usage(); // UNIX anyone? 023 } 024 025 int aa = Integer.parseInt(args[2]); 026 String ss = args[3]; 027 028 SimpleExternal bar = new SimpleExternal(aa, ss); 029 FileOutputStream f = new FileOutputStream(args[1]); 030 ObjectOutputStream s = new ObjectOutputStream(f); 031 032 s.writeObject(bar); 033 s.flush(); 034 } 035 036 else if (cmd.compareTo("r") == 0) 037 { 038 if (args.length != 2) 039 { 040 Usage(); 041 } 042 043 FileInputStream f = new FileInputStream(args[1]); 044 ObjectInputStream s = new ObjectInputStream(f); 045 046 System.out.println("Read SimpleExternal:"); 047 048 SimpleExternal bar = (SimpleExternal) s.readObject(); 049 bar.print(); 050 } 051 052 else { 053 System.err.println("Unknown command " + cmd); 054 Usage(); 055 } 056 } 057 058 catch (IOException ex) { 059 System.out.println("IO Exception:"); 060 System.out.println(ex.getMessage()); 061 ex.printStackTrace(); 062 } 063 catch (ClassNotFoundException ex) { 064 System.out.println("ClassNotFound Exception:"); 065 System.out.println(ex.getMessage()); 066 ex.printStackTrace(); 067 } 068 } 069 }; Above, in red (lines 28-33), you see the code that first creates a SimpleExternal object, then opens a FileOutputStream, and associates that stream with an ObjectOutputStream. Finally, it tells the ObjectOutputStream to write the object by calling ObjectOutputStream.writeExternal(). (ObjectOutputStream.writeObject() checks to see if an object it receives implements Externalizable and, if so, hands over all externalizing to the class’s writeExternal() function.)The input code appears in blue (lines 43-49). This time, we create a FileInputStream and associate it with an ObjectInputStream. The ObjectOutputStream function reads the first (in this case, the only) object from the file using SimpleExternal.readExternal(). The result is the original object, reconstituted from an “external” format. Let’s run the program and see how it works:C:>java Demo7a w sample.ser 12 ThisIsASampleString C:>java Demo7a r sample.ser Read SimpleExternal: Integer: 12 String: ThisIsASampleString C:>java Demo7a w sample.ser 215 TestingUnoDosTres C:>type sample.ser _ sr SimpleExternal7. xpEXTERNAL 215 TestingUnoDosTres END_EXTERNAL C:>java Demo7a r sample.serRead SimpleExternal: Integer: 215 String: TestingUnoDosTres Lurking in among the binary numbers is the number 215 and the string TestingUnoDosTres, surrounded by EXTERNAL and END_EXTERNAL. This string is the object, serialized in the external format that we defined for the class. You can see that the main program we wrote works just fine. Cool! But what’s all that other trash? You’ll remember that ObjectOutputStream writes a description of the class to the stream; well, that’s what you’re seeing. That extra class information tells the ObjectInputStream what kind of object is coming next in the stream, so the class loader can find the bytecodes for the class. The ObjectInputStream creates the SimpleExternal object, then tells the SimpleExternal to “fill” itself from the stream by calling the new object’s readExternal() method.If you don’t like this “interference” from ObjectOutputStream, you can write your own class of streamed objects by implementing your own types of ObjectOutput and ObjectInput, and then passing instances of these objects to the writeExternal or readExternal methods of your initial class. Of course, you’ll have to find some way of identifying the object type you want when you stream the object in, since ObjectInputStream will no longer be doing that for you. Serial killers: How to avoid unwanted serializationThere’s a danger in providing writeExternal and readExternal methods: You can change an object’s state from the outside, or read an object without calling the accessor methods. It’s the component designer’s job to ensure that sensitive information is protected from serialization or externalization. This section offers a couple of relatively easy ways to do just that.The first and easiest way to protect a field from being serialized is to simply not make it serializable. If you leave the keywords implements Serializable off of the class definition, the class won’t be serializable, and so serialization (obviously) will be prevented.Sometimes, though, you may want to subclass a Serializable class but forbid the serialization of instances of that subclass. Since an object that inherits from a Serializable superclass is also serializable, you’ve got to find another way to prevent serialization. One cute trick is to implement a method that simply throws NotSerializableException, like this: import java.io.*; public class SecretObject007 extends SpyObject { public void writeObject(Object obj) throws NotSerializableException { throw new NotSerializableException("Not Serializable"); } ... Any class that tries to serialize this object will simply get an exception.To protect a specific field from being serialized, mark it private transient (or make it static, if appropriate) since transient and static fields are never serialized. So, in a bean used to save account information for a user, you might see:public class AccountBean implements Externalizable { private String sName; private String sUserID; private transient String sPassword; ... The standard object serialization software would simply ignore “sPassword” and write the other fields to the output stream as expected. You can still serialize the object, but the password won’t show up in the output. There is a way to store an encrypted password. You could make sPassword transient as above, and then implement writeObject() so that it calls defaultWriteObject first, and then writes the encrypted password to the stream. Of course, readObject() would do the reverse. ConclusionWe’ve gone a bit further into serialization this month, covering recursive serialization, the Externalizable interface, and preventing the serialization of sensitive information. Next month, we’ll finish up the topic of serialization and persistence with a discussion of serialized object versioning, and go over some problems with the current serialization mechanism.Reader comments and clarificationsLast month, a few sophisticated readers wrote in and pointed out some issues that need clarification (and, in one case, a correction.)First, thanks to the reader who requested that the articles not refer to pieces of code by color, since many readers print the articles on black-and-white printers. You’ll notice that this month, I’ve added line numbers to the code listings and references in the text. In last month’s column, I said:The interface java.io.Serializable specifies that a class that implements it contain two methods with the following signatures:This was simply incorrect. In fact, the class need not implement these methods because, as is noted later in the article, the ObjectOutputStream class knows how to serialize any Java object. So, if ObjectOutputStream knows how to serialize any object, why does the Serializable interface exist at all?Looking at the JDK source code for interface java.io.Serializable, we find something startling:// ...lots of comments... package java.io; interface Serializable { }; The Serializable interface has no methods at all! What is going on here? The documentation for interface java.io.Serializable states: The serialization interface has no methods or fields and serves only to identify the semantics of being serializable.In other words, a class implements Serializable solely for the purpose of indicating to other classes (in particular, java.io.ObjectOutputStream and java.io.ObjectInputStream) that it is indeed “willing” to be serialized. The designers of Java decided that it was safer to require programmers to explicitly declare a class to be serializable, rather than making classes serializable by default. This ensures that an object that manipulates sensitive information (security related objects and so forth) isn’t inadvertently serialized and transmitted to places it shouldn’t be.So, where do the methods writeObject() and readObject() come from, if not from the Serializable interface? Classes such as ObjectOutputStream and ObjectInputStream, which serialize and deserialize other classes, by convention search for methods with precisely these signatures:private void writeObject(java.io.ObjectOutputStream out) throws IOException; private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException; The mere existence of these methods in a class is that class’s declaration that it knows how to serialize itself. ObjectOutputStream uses introspection to check for the existence of private void writeObject(java.io.ObjectOutputStream out). ObjectOutputStream uses the class’s writeObject method to serialize the object of the method exists, and uses defaultWriteObject() otherwise.Many thanks to the alert readers who wrote in and corrected me on this important point.Mark Johnson has a B.S. in Computer and Electrical Engineering from Purdue University (1986). He is a fanatical devotee of the Design Pattern approach in object-oriented architecture, of software components in theory, and of JavaBeans in practice. Over the past several years, he worked for Kodak, Booz-Allen and Hamilton, and EDS in Mexico City, developing Oracle and Informix database applications for the Mexican Federal Electoral Institute and for Mexican Customs. He currently works as a designer and developer for Object Products in Fort Collins, CO. Data ManagementJava