Wednesday, September 7, 2011

Customizing Java Serializarion [Part 1]

In our last post we looked at some of the basics of Java Serialization. We also took a look at how object resolving is done by the JVM and the effects of serialization of the same object on different/same ObjectOutputStreams.

In this post we will look at how we can customize the process of serialization.

When we write a class and implement the Serializable interface, Java gives us a default serialization process. But sometimes we may want to customize the serialization process. There are several reasons why we might go for a customized serialization. Following is a not-an-exhaustive list:

  • Protect sensitive data.
  • Protect your invariants.
  • Control your instances.
  • Persist only meaningful data.
  • Manage serialization between different versions of your class.
  • Avoid exposing the serialization mechanism to client API.

In this part of the post we will look into the first three items.

Disclaimer: Some of the reasons for each of the above situations may seem argumentative. So it is best to view the below examples merely as illustrative purposes only and not as a reference to good programming practice. Basically the examples below depict the “how” of customizing serialization rather than the “when” of customizing serialization.

Protect Sensitive Data: Suppose we have a class UserAccount class that stores a username and password for an account. The class could be as follows:
public class UserAccount
{
         private String username;
         private String password;

         //other implementations
}
Now suppose we want to store all the details of the UserAccount instance except the password; then simply implement Serializable in the class will serialize the password as well. Hence we need to prevent the serialization of the password field. The way to do this is to introduce a method of the exact following signature into the class and do the serialization ourselves.
private void writeObject(ObjectOutputStream o) 
                       throws IOException, ClassNotFoundException
{
    //do serialization of required fields here
}

What is this method? Why is it private?

Well, this method is one of the several methods provided in the Java Serialization Specification that lets us customize the process of object serialization. If we put this method onto our class the JVM will call this method to do the serialization of the current object. Hence, to prevent the serialization of the password field our class will look like this:
class UserAccount implements Serializable
{
	public String username;
	public String password;   

	public UserAccount()
	{
       		username = "defaultUsername";
       		password = "defaultPassword";
	}   

	public UserAccount(String u, String p)
	{
       		username = u;
      		password = p;
	}   

	private void readObject(ObjectInputStream o) 
		throws IOException, ClassNotFoundException
	{
       		username = (String)o.readObject();
	}

	private void writeObject(ObjectOutputStream o) 
		throws IOException
	{
       		o.writeObject(username);
	}   

	public String toString()
	{
       		return username + ", " + password;
	}
}
In the above class we are skipping the serialization of the password field and just persisting the userName field. In addition, while de-serializing the object, we use a similar hook, readObject(), to read only the userName field. Note that trying to read the password field would have resulted in a java.io.OptionalDataException.

Protect your invariants: Suppose we have a class that represents a coordinate in the first quadrant. The class might look like this:
class Coordinate
{
	private int x;
	private int y;

	public Coordinate(int x, int y)
	{
		validateInvariants();       

		this.x = x;
		this.y = y;
	}

	public int getX()
	{
		return x;
	}

	public int getY()
	{
		return y;
	}  

	private void validateInvariants()
	{
		if(x < 0 || y < 0)
		{
           	throw new IllegalArgumentException();
		}
  	 }
}
Here the method validateInvariants throws an IllegalArgumentException if either of the coordinates are negative. Now if we want our Coordinate class to be serializable we should protect our invariants when an instance of the class is de-serialized. Hence after implementing Serializable, this is how we might do that:
class Coordinate implements Serializable
{
	private int x;
	private int y;
	
	public Coordinate(int x, int y)
	{
		validateInvariants();
		
		this.x = x;
		this.y = y;
	}

	public int getX()
	{
		return x;
	}

	public int getY()
	{
		return y;
	}
	
	private void readObject(ObjectInputStream o) 
		throws IOException, ClassNotFoundException
	{
		x = o.readInt();
		y = o.readInt();
		
		validateInvariants();
	}
	
	private void writeObject(ObjectOutputStream o) 
		throws IOException
	{
		o.writeInt(x);
		o.writeInt(y);
	}
	
	private void validateInvariants()
	{
		if(x < 0 || y < 0)
		{
			throw new IllegalArgumentException();
		}
	}
}
Here while de-serializing our instance we againg check for our invariants. If someone had modified the byte representation on the serialized form to introduce a negative value of the co-ordinates we would not have found it withoud the above readObject() method.

Control your instances: Suppose we have a class for which we want only a single instance of that class. Such instance are called Singletons. Singletons do not have any public constructors and clients get the reference to the instance via a static method normally named as getInstance. Consider the following class that is a singleton:

class Singleton
{
	private static final Singleton INSTANCE = new Singleton();

	private Singleton()
	{

	}

	public static Singleton getInstance()
	{
		return INSTANCE;
	}

	// other methods
}
Now suppose we want the above Singleton class to be serializable. We can implement the Serializable interface for the above class and be done with it. But in that case we won’t be able to protect the singleton nature of the instance, such that after de-serialization there will be more than one instance of the class. This can be proved as follows:
class Singleton implements Serializable
{
	private static final Singleton INSTANCE = new Singleton();

	private Singleton()
	{

	}

	public static Singleton getInstance()
	{
		return INSTANCE;
	}

	// other methods
}
Now I use the following snippet to prove that there will be multiple instances:
public final class ControlInstances
{
	public static void main(String[] args) throws Exception
	{
		ObjectOutputStream out = new ObjectOutputStream(
						new FileOutputStream(new File("out3.dat")));

		out.writeObject(Singleton.getInstance());
		out.close();

		ObjectInputStream in = new ObjectInputStream(
						new FileInputStream(new File("out3.dat")));

		Singleton u = (Singleton) in.readObject();
		in.close();

		System.out.println(Singleton.getInstance() == u);
	}
}
The above code prints false. This means that now after serialization and a subsequent de-serialization there are multiple instances of our supposedly singleton class. The way to avoid this is using another hook, the readResolve() method. The readResolve method is called when the  ObjectInputStream has read an object from the stream and is preparing to return it to the caller. ObjectInputStream checks whether the class of the object defines the readResolve method. If the method is defined, the readResolve method is called to allow any changes in the object before it is returned.

Hence our class definition will be:
class Singleton implements Serializable
{
	private static final Singleton INSTANCE = new Singleton();

	private Singleton()
	{

	}

	public static Singleton getInstance()
	{
		return INSTANCE;
	}

	private Object readResolve() throws ObjectStreamException
	{
		return getInstance();
	}

	// other methods
}
Now the test code will print true which means that even after de-serialization, we still have only one instance of the class in our JVM.

That is all for now. In part 2 of this post, we will look into the rest three items [Persist only meaningful data, Manage serialization between different versions of your class, Avoid exposing the serialization mechanism to client API] and some best practices that should be followed when we use the Serialization API in our applications.

Happy Coding! 

4 comments:

  1. Thanks for your comment on my post 10 interview questions on Serialization in Java
    Swaranga . I see you have also covered the topic in great detail. to avoid password being serialized isn't it simple to make it transient
    ?

    ReplyDelete
  2. Well yes, As I said, the above examples should not be used as a reference for when we should utilise these features, rather they should be seen as how to utilise them if the need arises. In addition, we can encrypt/decrypt the password when doing serialization/deserialization using the above method.

    ReplyDelete
  3. all very good, I'm very need

    thansjk

    http://www.dangcapseo.com/seo/

    ReplyDelete
  4. I have no words for this great post such a awe-some information i got gathered. Thanks to Author.
    Vee Eee Technologies|

    ReplyDelete