Wednesday, September 14, 2011

Customizing Java Serialization [Part 2]

In our last post we looked at how to customize the process of serialization. Today we look a bit further and see how we can manage versioning of our classes in serialization and deserialization. Basically we will cover the following reasons on why we should look to customize the serialization mechanism:

  • Persist only meaningful data. 
  • Manage serialization between different versions of your class. 
  • Avoid exposing the serialization mechanism to client API.

Persist only sensible data: This one can be subtle sometimes and hence I will take the example from one of the standard classes. Consider the following skeletal declaration of the LinkedList<E> class in java.util:
class LinkedList < E >
{
	private Entry < E > header;
	int size = 0;

	private static class Entry < E >
	{
		E element;

		Entry < E > next;
		Entry < E > previous;

		// other implementations
	}

	// other implementations
}
I have removed and edited out parts of the above class that does not pertain to our example. The above class represents a basic linked list where each node is internally represented by an instance of the Entry class. Note that each instances of Entry has two references next and previous to simulate to simulate a doubly linked list.

Now consider that we have to make our LinkedList class serializable. If we accept the default serializable policy [after making the Entry class implement Serializable] we get a working solution, provided the element E is also Serializable. Hence the class declaration could become:
class LinkedList < E > implements Serializable
{
	private Entry < E > header;
	int size = 0;

	private static class Entry < E > implements Serializable
	{
		E element;
		Entry < E > next;
		Entry < E > previous;

		// other implementations
	}

	// other implementations
}
However consider the way Java resolves object serialization by creating an object graph. In this case, for each node in the list the JVM will have to visit the next and previous node too for serialization. But in effect, the previous node will always have been marked for serialization so effectively 50% of the object resolution is done with no additional effect. We can do much better if we go for a custom serialization. The idea is to make the header instance transient and not implement Serializable for the Entry class. Now we define a writeObject and readObject  hook that instead serializes/de-serializes all the elements into/from the stream. Hence our class definition becomes:
class LinkedList < E > implements Serializable
{
	private transient Entry < E > header;
	int size = 0;

	private static class Entry < E > implements Serializable
	{
		E element;
		Entry < E > next;
		Entry < E > previous;

		// other implementations
	}

	private Object readObject(ObjectInputStream o)
		throws IOException, ClassNotFoundException
	{
		int size = o.readInt();
		// Initialize header

		header = new Entry < E >(null, null, null);
		header.next = header.previous = header;

		// Read in all elements in the proper order.
		for (int i = 0; i  <  size; i++)
		{
			addBefore((E) s.readObject(), header);
		}
	}

	private void writeObject(java.io.ObjectOutputStream s)
		throws java.io.IOException
	{
		// Write out size
		s.writeInt(size);
		
		// Write out all elements in the proper order.
		for (Entry e = header.next; e != header; e = e.next)
		{
			s.writeObject(e.element);
		}
	}
}
In the above case the JVM is relieved of the expensive and partly redundant object graph traversal/creation because we chose to do away with the default serialization process.
Manage Class Versions: Whenever we implement a class as a Serializable one, the JVM looks for a field in the class called serialVersionUID. This is a long value and is basically used to associate version numbers to Serializable classes. The field is used during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization. If the receiver has loaded a class for the object that has a different serialVersionUID than that of the corresponding sender's class, then deserialization will result in an InvalidClassException. A serializable class can declare its own serialVersionUID explicitly by declaring a field named "serialVersionUID" that must be static, final, and of type long:
ANY-ACCESS-MODIFIER static final long serialVersionUID = 42L;
If a serializable class does not explicitly declare a serialVersionUID, then the serialization runtime will calculate a default serialVersionUID value for that class based on various aspects of the class, as described in the Java(TM) Object Serialization Specification. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations, and can thus result in unexpected InvalidClassExceptions during deserialization. Therefore, to guarantee a consistent serialVersionUID value across different java compiler implementations, a serializable class must declare an explicit serialVersionUID value. It is also strongly advised that explicit serialVersionUID declarations use the private modifier where possible, since such declarations apply only to the immediately declaring class--serialVersionUID fields are not useful as inherited members.

To illustrate the above, let us go through an example:

Suppose we have a class that represents a version of our product and want to serialize the class:
class Version implements Serializable
{
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;

	public Version(String majorVersion, 
		           String minorversion, 
		           String subMinorVersion)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
	}

	public String getVersion()
	{
		return  majorVersion + "." + 
			    minorversion + "." + 
			    subMinorVersion;
	}
}
Note that we have not declared any serialVersionUID in our class, hence the compiler will insert its own serialVersionUID into our class.
Now we use the following snippet to serialize an instance of the above class:

public final class VersionTest
{
	public static void main(String[] args) throws Exception
	{
		Version v1 = new Version("5", "01", "001");

		ObjectOutputStream out = new ObjectOutputStream(
						new FileOutputStream(
							new File("VersionTest.dat")));

		out.writeObject(v1);

		ObjectInputStream in = new ObjectInputStream(
						new FileInputStream(
							new File("VersionTest.dat")));

		Version u = (Version) in.readObject();

		in.close();

		System.out.println(u.getVersion());
	}
}
The above code prints “5.01.001”. So far so good.
Now suppose we decide to insert a “service pack” details to our version class. The class declaration becomes:

class Version implements Serializable
{
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;
	private String servicePackName;
	private String servicePackNumber;

	public Version(String majorVersion, 
			       String minorversion, 
			       String subMinorVersion, 
			       String servicePackName, 
			       String servicePackNumber)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
		this.servicePackName = servicePackName;
		this.servicePackNumber = servicePackNumber;
	}

	public String getVersion()
	{
		return majorVersion + "." + 
			   minorversion + "." + 
			   subMinorVersion + ", " + 
			   servicePackName + " " + 
			   servicePackNumber;
	}
}
Now we try to read the previously serialized object from the Stream using the following code:
public final class VersionTest
{
	public static void main(String[] args) throws Exception
	{
		ObjectInputStream in = new ObjectInputStream(
						new FileInputStream(
							new File("VersionTest.dat")));

		Version u = (Version) in.readObject();

		in.close();

		System.out.println(u.getVersion());
	}
}
On running the above code, the program throws a java.io.InvalidClassException. The reason is that while serialization, since we did not specify a serialVersionUID member in our class, the JVM inserted the serialVersionUID itself, which happened to be 7356172634211253087L. This calculation was done using the various field declarations of the class. Now when we modify our class to add two new fields, the generated serialVersionUID will be different [8734138131922723649 to be exact]. So when deserializing, the JVM sees that the serialVersionUID of the stream class and that of the local class is different, hence it saw the conflict as an attempt to deserialize an incompatible version of the local class. So it threw the exception.

However we know that the two versions are not completely incompatible. In fact the newer version of the class is just an extension of its previous version. So is there a way we can avoid the exception and still manage the deserialization? You bet we can!

The solution is to introduce our own serialVersionUID field in our class definition. This value can be any arbitrary long value. Now our previous class definition gets a serialVersionUID of 1L.

class Version implements Serializable
{
	private static final long serialVersionUID = 1L;

	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;

	public Version(String majorVersion, 
		           String minorversion, 
		           String subMinorVersion)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
	}

	public String getVersion()
	{
		return  majorVersion + "." + 
			    minorversion + "." + 
			    subMinorVersion;
	}
}
Now we serialize an instance of the above class using the same code as shown above. Later we change our class definition to this:

class Version implements Serializable
{
	private static final long serialVersionUID = 1L;
	
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;
	private String servicePackName;
	private String servicePackNumber;

	public Version(String majorVersion, 
			       String minorversion, 
			       String subMinorVersion, 
			       String servicePackName, 
			       String servicePackNumber)
	{
		this.majorVersion = majorVersion;
		this.minorversion = minorversion;
		this.subMinorVersion = subMinorVersion;
		this.servicePackName = servicePackName;
		this.servicePackNumber = servicePackNumber;
	}

	public String getVersion()
	{
		return majorVersion + "." + 
			   minorversion + "." + 
			   subMinorVersion + ", " + 
			   servicePackName + " " + 
			   servicePackNumber;
	}
}
Note that we have not changed our serialVersionUID value. If we deserialize our stream into this version of the class the code prints “5.01.001, null null”. Which means the classes were found to be compatible and the unknown fields were given their default values. We can perfect our class so that the null, null is not printed if the servicePackName and the servicePackNumber is null.

class Version implements Serializable
{
	private static final long serialVersionUID = 1L;
        
	private String majorVersion;
	private String minorversion;
	private String subMinorVersion;
	private String servicePackName;
	private String servicePackNumber;
	
	
	public Version(String majorVersion, 
			String minorversion, 
			String subMinorVersion, 
			String servicePackName, 
			String servicePackNumber)
	{
	        this.majorVersion = majorVersion;
	        this.minorversion = minorversion;
	        this.subMinorVersion = subMinorVersion;
	        this.servicePackName = servicePackName;
	        this.servicePackNumber = servicePackNumber;
	}
	
	public String getVersion()
	{
		String ret = majorVersion +"." + 
			     minorversion + "." + 
			     subMinorVersion;
		
		if(servicePackName != null)
		{
			ret = ret + ", " + servicePackName;
		}
		
		if(servicePackNumber != null)
		{
			ret = ret + " " + servicePackNumber;
		}
		
		return ret;
	}
}
Avoid exposing the serialization mechanism to client API: The last reason to customize serialization is perhaps the most important when it comes to designing your classes. The problem with implementing the default serialization is that in doing so we expose the internals of our class to the client. This is never a good idea. As such if we do a default serialization, clients might code their logic depending on our default serialization. Later when we think that we should serialize our class in a different way we cannot do it without potentially breaking our client’s code.

However if we customize our serialization, then we prevent the serialization mechanism from being a part of our public API. We can, whenever we want, change the way we do our serialization without breaking our client’s code. 

That's all for now.

In this and the previous article, we covered a bit of the Serializable interface and customizing serialization in Java. For readers who are interested in the full details of the Java serialization specification, I would redirect them to the official serialization specification:



Wednesday, September 7, 2011

Customizing Java Serializarion [Part 1]

In our last post we looked at some of the basics of Java Serialization. We also took a look at how object resolving is done by the JVM and the effects of serialization of the same object on different/same ObjectOutputStreams.

In this post we will look at how we can customize the process of serialization.

When we write a class and implement the Serializable interface, Java gives us a default serialization process. But sometimes we may want to customize the serialization process. There are several reasons why we might go for a customized serialization. Following is a not-an-exhaustive list:

  • Protect sensitive data.
  • Protect your invariants.
  • Control your instances.
  • Persist only meaningful data.
  • Manage serialization between different versions of your class.
  • Avoid exposing the serialization mechanism to client API.

In this part of the post we will look into the first three items.

Disclaimer: Some of the reasons for each of the above situations may seem argumentative. So it is best to view the below examples merely as illustrative purposes only and not as a reference to good programming practice. Basically the examples below depict the “how” of customizing serialization rather than the “when” of customizing serialization.

Protect Sensitive Data: Suppose we have a class UserAccount class that stores a username and password for an account. The class could be as follows:
public class UserAccount
{
         private String username;
         private String password;

         //other implementations
}
Now suppose we want to store all the details of the UserAccount instance except the password; then simply implement Serializable in the class will serialize the password as well. Hence we need to prevent the serialization of the password field. The way to do this is to introduce a method of the exact following signature into the class and do the serialization ourselves.
private void writeObject(ObjectOutputStream o) 
                       throws IOException, ClassNotFoundException
{
    //do serialization of required fields here
}

What is this method? Why is it private?

Well, this method is one of the several methods provided in the Java Serialization Specification that lets us customize the process of object serialization. If we put this method onto our class the JVM will call this method to do the serialization of the current object. Hence, to prevent the serialization of the password field our class will look like this:
class UserAccount implements Serializable
{
	public String username;
	public String password;   

	public UserAccount()
	{
       		username = "defaultUsername";
       		password = "defaultPassword";
	}   

	public UserAccount(String u, String p)
	{
       		username = u;
      		password = p;
	}   

	private void readObject(ObjectInputStream o) 
		throws IOException, ClassNotFoundException
	{
       		username = (String)o.readObject();
	}

	private void writeObject(ObjectOutputStream o) 
		throws IOException
	{
       		o.writeObject(username);
	}   

	public String toString()
	{
       		return username + ", " + password;
	}
}
In the above class we are skipping the serialization of the password field and just persisting the userName field. In addition, while de-serializing the object, we use a similar hook, readObject(), to read only the userName field. Note that trying to read the password field would have resulted in a java.io.OptionalDataException.

Protect your invariants: Suppose we have a class that represents a coordinate in the first quadrant. The class might look like this:
class Coordinate
{
	private int x;
	private int y;

	public Coordinate(int x, int y)
	{
		validateInvariants();       

		this.x = x;
		this.y = y;
	}

	public int getX()
	{
		return x;
	}

	public int getY()
	{
		return y;
	}  

	private void validateInvariants()
	{
		if(x < 0 || y < 0)
		{
           	throw new IllegalArgumentException();
		}
  	 }
}
Here the method validateInvariants throws an IllegalArgumentException if either of the coordinates are negative. Now if we want our Coordinate class to be serializable we should protect our invariants when an instance of the class is de-serialized. Hence after implementing Serializable, this is how we might do that:
class Coordinate implements Serializable
{
	private int x;
	private int y;
	
	public Coordinate(int x, int y)
	{
		validateInvariants();
		
		this.x = x;
		this.y = y;
	}

	public int getX()
	{
		return x;
	}

	public int getY()
	{
		return y;
	}
	
	private void readObject(ObjectInputStream o) 
		throws IOException, ClassNotFoundException
	{
		x = o.readInt();
		y = o.readInt();
		
		validateInvariants();
	}
	
	private void writeObject(ObjectOutputStream o) 
		throws IOException
	{
		o.writeInt(x);
		o.writeInt(y);
	}
	
	private void validateInvariants()
	{
		if(x < 0 || y < 0)
		{
			throw new IllegalArgumentException();
		}
	}
}
Here while de-serializing our instance we againg check for our invariants. If someone had modified the byte representation on the serialized form to introduce a negative value of the co-ordinates we would not have found it withoud the above readObject() method.

Control your instances: Suppose we have a class for which we want only a single instance of that class. Such instance are called Singletons. Singletons do not have any public constructors and clients get the reference to the instance via a static method normally named as getInstance. Consider the following class that is a singleton:

class Singleton
{
	private static final Singleton INSTANCE = new Singleton();

	private Singleton()
	{

	}

	public static Singleton getInstance()
	{
		return INSTANCE;
	}

	// other methods
}
Now suppose we want the above Singleton class to be serializable. We can implement the Serializable interface for the above class and be done with it. But in that case we won’t be able to protect the singleton nature of the instance, such that after de-serialization there will be more than one instance of the class. This can be proved as follows:
class Singleton implements Serializable
{
	private static final Singleton INSTANCE = new Singleton();

	private Singleton()
	{

	}

	public static Singleton getInstance()
	{
		return INSTANCE;
	}

	// other methods
}
Now I use the following snippet to prove that there will be multiple instances:
public final class ControlInstances
{
	public static void main(String[] args) throws Exception
	{
		ObjectOutputStream out = new ObjectOutputStream(
						new FileOutputStream(new File("out3.dat")));

		out.writeObject(Singleton.getInstance());
		out.close();

		ObjectInputStream in = new ObjectInputStream(
						new FileInputStream(new File("out3.dat")));

		Singleton u = (Singleton) in.readObject();
		in.close();

		System.out.println(Singleton.getInstance() == u);
	}
}
The above code prints false. This means that now after serialization and a subsequent de-serialization there are multiple instances of our supposedly singleton class. The way to avoid this is using another hook, the readResolve() method. The readResolve method is called when the  ObjectInputStream has read an object from the stream and is preparing to return it to the caller. ObjectInputStream checks whether the class of the object defines the readResolve method. If the method is defined, the readResolve method is called to allow any changes in the object before it is returned.

Hence our class definition will be:
class Singleton implements Serializable
{
	private static final Singleton INSTANCE = new Singleton();

	private Singleton()
	{

	}

	public static Singleton getInstance()
	{
		return INSTANCE;
	}

	private Object readResolve() throws ObjectStreamException
	{
		return getInstance();
	}

	// other methods
}
Now the test code will print true which means that even after de-serialization, we still have only one instance of the class in our JVM.

That is all for now. In part 2 of this post, we will look into the rest three items [Persist only meaningful data, Manage serialization between different versions of your class, Avoid exposing the serialization mechanism to client API] and some best practices that should be followed when we use the Serialization API in our applications.

Happy Coding!