Tuesday, August 23, 2011

Serialization in Java

Java provides its developers with a rich set of APIs for object serialization. In this article we will look into some of the intricacies of object serialization in Java.

We shall start with the most basic example of how to serialize an object in Java; here are the high level steps:

  • Step 1: Get the object to serialize [can be null]. Let the object be o.
  • Step 2: Get a valid ObectOutputStream instance. Let the ObjectOutputStream instance be out.
  • Step 3: out.writeObject(o); will serialize the object to the stream.

Let us see the above steps in terms of a code snippet:
String s = new String(); //the object that we will serialize
ObjectOutputStream out = new ObjectOutputStream(
        new FileOutputStream(“out.dat”));
out.writeObject(s); //serialize the object
Now let us look at some of the basic guidelines that should be followed for Object serialization:

1. Any class that wants its objects to be serialized must implement the Serializable interface. The Serializable interface does not have any methods. It merely serves to flag an object as serializable to an ObjectOutputStream.

2. If there any members in a Serializable class, then the following guidelines apply:

     i. If they are primitives, they are automatically serializable.

    ii. If they are non-primitive objects, they must implement Serializable. If we try to serialize an object that contains reference to an object that does not implement Serializable then while serializing the object, we get a Runtime Exception.

    iii. If we have a reference to a non-serializable object in our class, then we have to mark the reference with the keyword transient. The transient keyword on a reference means that when the parent object is serialized then the object whose reference is marked as transient will not be serialized.

Let us take a look at a simple code snippet that illustrates the above guidelines:
// a non-serializable class
public class Model
    private Integer modelID;
    private String modelName;
    //rest of the implementations

// a serializable class
public class Engine implements Serializable
    private int engineID;
    private String engineName;
    // other implementations

public class Car implements Serializable
    //a primitive, hence serializable
    private int carID; 

    //a non-serializable object, hence transient
    private transient Model carModel; 

    //a serializable object, hence no transient
    private Engine carEngine; 
    // other implementations
Now when we try to serialize an instance of a Car object, there will be no exceptions thrown because all the members of the car object are either primitives, or implement Serializable or are marked with the keyword transient.

Note that in our Car declaration if we had not marked our Model object as transient, we would have got a RuntimeException while trying to serialize the Car object.

The above example was a very basic one to show how to serialize an object in Java. Now let us look under the hood as to how Java resolves objects during serialization. To state the problem, consider the following class declarations:

public class GearType implements Serializable
    private int ID;
    private String gearName;
    //other implementations;

public class Car implements Serializable
    private int ID;
    private GearType gearType
    public Car(int i, GearType g)
        this.ID = i;
        this.gearType = g;

Now let us serialize two Car objects:

GearType g = new GearType();
Car c1 = new Car(1, g);
Car c2 = new Car(2, g);
ObjectOutputStream out = new ObjectOutputStream(

In the above code snippet, we serialized two Car objects [note the usage of the same GearType object to construct two different Car objects]. The interesting question is: how many GearType objects were serialized?

There was only one GearType object that both the Car objects were sharing. Hence it should only serialize one GearType object. The answer indeed is one. This can be proved by checking if the serialized GearType object is the same for both the Car objects. Let us look into that:
ObjectInputStream in = new ObjectInputStream(
        new FileInputStream(“out.dat”));
Car first = (Car) in.readObject();
Car second = (Car) in.readObject();
System.out.println(first.getGearType() == second.getGearType());

The above code does print true. Note that here we are testing for object identity of the GearType object instead of logical equality [which is done via equals() method]. The identity test is required because we want to verify whether both the GearType objects are actually the same.

While performing serialization of objects, Java forms a data structure similar to an Object Graph to determine which objects need to be serialized. It starts from the main object to serialize, and recursively traverses all the objects reachable from the main object. For each object that it encounters, which needs serialization, it associates an identifier that marks the object as already been serialized to the given ObjectOutputStream instance. So when Java encounters the same object that has already been marked as serialized to the ObjectOutputStream, it does nor serialize the object again, rather a handle to the same object is serialized. This is how Java avoids having to re-serialize an already serialized object. The seemingly complex problem was solved by the simple method of assigning IDs to objects. Beautiful!

One important thing to note is that if we used different ObjectOutputStream instances to serialize the two Car objects, then Java would have serialized the same GearType object twice albeit in the different streams. This is because the first time Java marks the GearType object with an ID, that ID will associate the object to the first ObjectOutputStream and the next time when Java encounters the same object for serialization it sees that this object has not been serialized to the current ObjectOutputStream and hence it will serialize it again to the new stream.

Let us illustrate this via a code snippet:
GearType g = new GearType();
Car c1 = new Car(1, g);
Car c2 = new Car(2, g);
ObjectOutputStream out1 = new ObjectOutputStream(
ObjectOutputStream out2 = new ObjectOutputStream(

Now to prove that the GearType object is indeed serialized twice, we read the streams back:

ObjectInputStream in1 = new ObjectInputStream(
        new FileInputStream(“out1.dat”));
ObjectInputStream in2 = new ObjectInputStream(
        new FileInputStream(“out2.dat”));
Car first = (Car) in1.readObject();
Car second = (Car) in2.readObject();
System.out.println(first.getGearType() == second.getGearType());

Now the above code prints false, which proves that the GearType object was serialized twice.

That is all for now. We have barely touched the surface of serialization in Java. In the next article [hopefully soon] we will take a look at other aspects of serialization like, customizing serialization, protecting classes from serialization [specifically protecting your singletons], class versioning problems etc.

Happy coding!