Wednesday, September 14, 2011

Customizing Java Serialization [Part 2]

In our last post we looked at how to customize the process of serialization. Today we look a bit further and see how we can manage versioning of our classes in serialization and deserialization. Basically we will cover the following reasons on why we should look to customize the serialization mechanism:

  • Persist only meaningful data. 
  • Manage serialization between different versions of your class. 
  • Avoid exposing the serialization mechanism to client API.

Persist only sensible data: This one can be subtle sometimes and hence I will take the example from one of the standard classes. Consider the following skeletal declaration of the LinkedList<E> class in java.util:
class LinkedList < E >
{
	private Entry < E > header;
	int size = 0;

	private static class Entry < E >
	{
		E element;

		Entry < E > next;
		Entry < E > previous;

		// other implementations
	}

	// other implementations
}
I have removed and edited out parts of the above class that does not pertain to our example. The above class represents a basic linked list where each node is internally represented by an instance of the Entry class. Note that each instances of Entry has two references next and previous to simulate to simulate a doubly linked list.

Now consider that we have to make our LinkedList class serializable. If we accept the default serializable policy [after making the Entry class implement Serializable] we get a working solution, provided the element E is also Serializable. Hence the class declaration could become:
class LinkedList < E > implements Serializable
{
	private Entry < E > header;
	int size = 0;

	private static class Entry < E > implements Serializable
	{
		E element;
		Entry < E > next;
		Entry < E > previous;

		// other implementations
	}

	// other implementations
}
However consider the way Java resolves object serialization by creating an object graph. In this case, for each node in the list the JVM will have to visit the next and previous node too for serialization. But in effect, the previous node will always have been marked for serialization so effectively 50% of the object resolution is done with no additional effect. We can do much better if we go for a custom serialization. The idea is to make the header instance transient and not implement Serializable for the Entry class. Now we define a writeObject and readObject  hook that instead serializes/de-serializes all the elements into/from the stream. Hence our class definition becomes:
class LinkedList < E > implements Serializable
{
	private transient Entry < E > header;
	int size = 0;

	private static class Entry < E > implements Serializable
	{
		E element;
		Entry < E > next;
		Entry < E > previous;

		// other implementations
	}

	private Object readObject(ObjectInputStream o)
		throws IOException, ClassNotFoundException
	{
		int size = o.readInt();
		// Initialize header

		header = new Entry < E >(null, null, null);
		header.next = header.previous = header;

		// Read in all elements in the proper order.
		for (int i = 0; i  <  size; i++)
		{
			addBefore((E) s.readObject(), header);
		}
	}

	private void writeObject(java.io.ObjectOutputStream s)
		throws java.io.IOException
	{
		// Write out size
		s.writeInt(size);
		
		// Write out all elements in the proper order.
		for (Entry e = header.next; e != header; e = e.next)
		{
			s.writeObject(e.element);
		}
	}
}
In the above case the JVM is relieved of the expensive and partly redundant object graph traversal/creation because we chose to do away with the default serialization process.
Manage Class Versions: Whenever we implement a class as a Serializable one, the JVM looks for a field in the class called serialVersionUID. This is a long value and is basically used to associate version numbers to Serializable classes. The field is used during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization. If the receiver has loaded a class for the object that has a different serialVersionUID than that of the corresponding sender's class, then deserialization will result in an InvalidClassException. A serializable class can declare its own serialVersionUID explicitly by declaring a field named "serialVersionUID" that must be static, final, and of type long:
ANY-ACCESS-MODIFIER static final long serialVersionUID = 42L;
If a serializable class does not explicitly declare a serialVersionUID, then the serialization runtime will calculate a default serialVersionUID value for that class based on various aspects of the class, as described in the Java(TM) Object Serialization Specification. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations, and can thus result in unexpected InvalidClassExceptions during deserialization. Therefore, to guarantee a consistent serialVersionUID value across different java compiler implementations, a serializable class must declare an explicit serialVersionUID value. It is also strongly advised that explicit serialVersionUID declarations use the private modifier where possible, since such declarations apply only to the immediately declaring class--serialVersionUID fields are not useful as inherited members.

To illustrate the above, let us go through an example:

Suppose we have a class that represents a version of our product and want to serialize the class:
class Version implements Serializable
{
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;

	public Version(String majorVersion, 
		           String minorversion, 
		           String subMinorVersion)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
	}

	public String getVersion()
	{
		return  majorVersion + "." + 
			    minorversion + "." + 
			    subMinorVersion;
	}
}
Note that we have not declared any serialVersionUID in our class, hence the compiler will insert its own serialVersionUID into our class.
Now we use the following snippet to serialize an instance of the above class:

public final class VersionTest
{
	public static void main(String[] args) throws Exception
	{
		Version v1 = new Version("5", "01", "001");

		ObjectOutputStream out = new ObjectOutputStream(
						new FileOutputStream(
							new File("VersionTest.dat")));

		out.writeObject(v1);

		ObjectInputStream in = new ObjectInputStream(
						new FileInputStream(
							new File("VersionTest.dat")));

		Version u = (Version) in.readObject();

		in.close();

		System.out.println(u.getVersion());
	}
}
The above code prints “5.01.001”. So far so good.
Now suppose we decide to insert a “service pack” details to our version class. The class declaration becomes:

class Version implements Serializable
{
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;
	private String servicePackName;
	private String servicePackNumber;

	public Version(String majorVersion, 
			       String minorversion, 
			       String subMinorVersion, 
			       String servicePackName, 
			       String servicePackNumber)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
		this.servicePackName = servicePackName;
		this.servicePackNumber = servicePackNumber;
	}

	public String getVersion()
	{
		return majorVersion + "." + 
			   minorversion + "." + 
			   subMinorVersion + ", " + 
			   servicePackName + " " + 
			   servicePackNumber;
	}
}
Now we try to read the previously serialized object from the Stream using the following code:
public final class VersionTest
{
	public static void main(String[] args) throws Exception
	{
		ObjectInputStream in = new ObjectInputStream(
						new FileInputStream(
							new File("VersionTest.dat")));

		Version u = (Version) in.readObject();

		in.close();

		System.out.println(u.getVersion());
	}
}
On running the above code, the program throws a java.io.InvalidClassException. The reason is that while serialization, since we did not specify a serialVersionUID member in our class, the JVM inserted the serialVersionUID itself, which happened to be 7356172634211253087L. This calculation was done using the various field declarations of the class. Now when we modify our class to add two new fields, the generated serialVersionUID will be different [8734138131922723649 to be exact]. So when deserializing, the JVM sees that the serialVersionUID of the stream class and that of the local class is different, hence it saw the conflict as an attempt to deserialize an incompatible version of the local class. So it threw the exception.

However we know that the two versions are not completely incompatible. In fact the newer version of the class is just an extension of its previous version. So is there a way we can avoid the exception and still manage the deserialization? You bet we can!

The solution is to introduce our own serialVersionUID field in our class definition. This value can be any arbitrary long value. Now our previous class definition gets a serialVersionUID of 1L.

class Version implements Serializable
{
	private static final long serialVersionUID = 1L;

	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;

	public Version(String majorVersion, 
		           String minorversion, 
		           String subMinorVersion)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
	}

	public String getVersion()
	{
		return  majorVersion + "." + 
			    minorversion + "." + 
			    subMinorVersion;
	}
}
Now we serialize an instance of the above class using the same code as shown above. Later we change our class definition to this:

class Version implements Serializable
{
	private static final long serialVersionUID = 1L;
	
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;
	private String servicePackName;
	private String servicePackNumber;

	public Version(String majorVersion, 
			       String minorversion, 
			       String subMinorVersion, 
			       String servicePackName, 
			       String servicePackNumber)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
		this.servicePackName = servicePackName;
		this.servicePackNumber = servicePackNumber;
	}

	public String getVersion()
	{
		return majorVersion + "." + 
			   minorversion + "." + 
			   subMinorVersion + ", " + 
			   servicePackName + " " + 
			   servicePackNumber;
	}
}
Note that we have not changed our serialVersionUID value. If we deserialize our stream into this version of the class the code prints “5.01.001, null null”. Which means the classes were found to be compatible and the unknown fields were given their default values. We can perfect our class so that the null, null is not printed if the servicePackName and the servicePackNumber is null.

class Version implements Serializable
{
	private static final long serialVersionUID = 1L;
        
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;
	private String servicePackName;
	private String servicePackNumber;
	
	
	public Version(String majorVersion, 
			String minorversion, 
			String subMinorVersion, 
			String servicePackName, 
			String servicePackNumber)
	{
	        this.majorVersion = majorVersion;
	        this.minorversion = minorversion;
	        this.subMinorVersion = subMinorVersion;
	        this.servicePackName = servicePackName;
	        this.servicePackNumber = servicePackNumber;
	}
	
	public String getVersion()
	{
		String ret = majorVersion +"." + 
			     minorversion + "." + 
			     subMinorVersion;
		
		if(servicePackName != null)
		{
			ret = ret + ", " + servicePackName;
		}
		
		if(servicePackNumber != null)
		{
			ret = ret + " " + servicePackNumber;
		}
		
		return ret;
	}
}
Avoid exposing the serialization mechanism to client API: The last reason to customize serialization is perhaps the most important when it comes to designing your classes. The problem with implementing the default serialization is that in doing so we expose the internals of our class to the client. This is never a good idea. As such if we do a default serialization, clients might code their logic depending on our default serialization. Later when we think that we should serialize our class in a different way we cannot do it without potentially breaking our client’s code.

However if we customize our serialization, then we prevent the serialization mechanism from being a part of our public API. We can, whenever we want, change the way we do our serialization without breaking our client’s code. 

That's all for now.

In this and the previous article, we covered a bit of the Serializable interface and customizing serialization in Java. For readers who are interested in the full details of the Java serialization specification, I would redirect them to the official serialization specification:



4 comments:

  1. second part is even more interesting than first part :) keep it up.

    Javin
    10 interview questions on Serialization

    ReplyDelete
  2. all very good, I'm very need

    http://www.dangcapseo.com/seo

    ReplyDelete
  3. i think it should be
    addBefore((E) o.readObject(), header);

    not

    addBefore((E) s.readObject(), header);

    What happens to header, because i see it's not serialized?

    ReplyDelete
  4. I would like to add that modern IDE's like Eclipse and Netbeans have the capability to add serialversiouid field in classes but it is recommended to use some tested third part library like Apache commons for the same.

    ReplyDelete