Serialization
1. Basic concepts
1.1 Definition
Serialization is the process of encoding an object, including
the objects it refers to, as a stream of byte data such that an equal
object can be reconstructed by reading from the stream (which is
referred to as "deserialization"). Serialization allows saving
objects in files and transmitting objects over a network. In
particular, technologies that support invoking the methods of an object
on another host such as
Java RMI and CORBA use a form of serialization to implement parameter
passing across the network. Serialization is also used in
technologies such as Enterprise Java Beans that automatically passivate
and activate server objects.
Serialization does not write class variables because they are not part
of the state of the object. It also does not transmit the
object's class object (e.g., its method dictionary) because the program
deserializing the stream must load that class. We will see that
Java serialization provides the ability to serialize any object without
writing methods that do so (we will see what is required as we
proceed).
1.2 The interface Serializable
The interface java.io.Serializable
defines no messages
(such interfaces are called “marker” or “tag” interfaces). Implementing
Serializable
or extending a class that implements
Serializable
identifies the class as one that participates in
serialization. Its instances can be used as the argument of
ObjectOutputStream.writeObject
and as the result of
ObjectInputStream.readObject
. If an object is encountered
that is not
serializable (e.g., a collection element), these methods throw
NotSerializableException
.
Most library classes are serializable, including String
,
collection classes, wrapper classes, GUI component classes, Date
, Color
, Point
, and URL
.
Library classes that are not serializable include
Thread
, reflection classes (Method
, etc.), stream
classes, Socket
, Graphics
, and
Image
. Generally, these are the classes that have
implementations
or "peers" that are system-dependent.
The Java compiler uses the "default serialization" mechanism described
in the next section for implementors of Serializable
.
(We will see how to customize serialization below.) It
stores all non-static instance variables referents that are
serializable objects or primitive types, and
all such variables inherited from serializable ancestors. The
default implementation handles shared and circular object references
and class identity. However, if an object includes
variables of class type that refer to objects whose classes are not
serializable, the object stream
methods will signal NotSerializableException
when
attempting to write or read
an instance. Similarly, if a collection is serializable but
contains objects that are not serializable, an exception will
be signaled. Note that this is a run-time exception, rather than
a compiler error. For example, an object (like all collections)
may have a field of type Object
, which is not
serializable. If that field refers to an instance of a
serializable class, no exception occurs upon serialization. We
will see below that variables marked as transient
are not serialized. If the default mechanism is adequate
(i.e., all fields are serializable and no special processing is
needed), a class need only declare that it implements Serializable
to be serializable.
1.3 Implementing serialization
Serializing an object must deal with three issues: 1) representing
built-in types, 2) encoding references to other objects, and 3)
maintaining type identity. Even for an object containing built-in
type fields only, this process is complex in C++ because it must deal
with non-standardized sizes for built-in types, big/little-endian
issues, data alignment, etc. (For example, Sun's eXternal Data
Representation for RPC and CORBA handle these issues.) In Java,
both the serializer and the deserializer are Java Virtual Machines so
these complications do not arise. In an object-oriented language,
it is also necessary to serialize inherited fields.
Clearly, storing pointer values to implement references would
be meaningless. To understand serialization with references, view
an object as the root of a directed graph of references to other
objects. In particular, when some object is referred to along
multiple paths in that graph, deserialization must not result in
multiple copies of that object. On output, the first reference to
an object stores the
object's fields to the stream, and creates a identifier that will be
used
for subsequent references to the object. (The Java process is
called
“serialization” because serial numbers are used to identify
references.) This procedure avoids duplication of a “subobject”
with multiple reference to it upon deserialization and maintains the
objects’ identities. The inverse process occurs when
deserializing an object from a stream: the first occurrence of the
object defines its fields and causes a copy to be created and
subsequent references use a serial number which is used to locate the
object. Clearly, serialization must also detect cycles in the
object graph to avoid infinite recursion. Note also that Java
supports non-static inner classes, whose instances have an implicit
reference to their "enclosing object" which must be serialized.
An object is more than just the field values it contains: it has a type
identity. The serialization process uses an instance of
ObjectStreamClass
(discussed further below) to identify an
object’s class, rather than just its class name.
Java uses the facilities of the "reflection" API to perform
serialization. An object's class object, an instance of the class
Class
, is available via getClass
.
This class object includes method for accessing the class's
ancestors and its members and their types. In particular, the
method Class.getFields
returns an array of instances of Field
,
which defines the accessors getName
and getType
,
as well as methods to obtain the value of that field in a particular
object.
2. Object input/output
2.1 Object stream classes
The classes ObjectInputStream
and ObjectOutputStream
support reading and writing serializable objects and primitive types,
and are defined in the package java.io
..
Like filter streams, their constructors take the source
or destination for the bytes that encode the object. Although
ObjectInputStream
and ObjectOutputStream
are used like filter streams, they are not a subclasses of FilterInputStream
and FilterOutputStream
, respectively.
ObjectInputStream
is a subclass of InputStream
that implements the interface ObjectInput
, a subinterface
of
DataInput
(which defines method such as readBoolean
and readDouble
) that adds readObject
.
That is, we can also use readInt
and so on with object
input streams. Like any stream method,
readObject
can signal IOException
if the stream fails. It can also signal several subclasses of ObjectStreamException
(a subclass of IOException
) such as InvalidClassException
,
NotSerializableException
and OptionalDataException
(an attempt is made to read an
object when the next item in the stream is a primitive type).
Deserializing an object
may require loading its class so readObject
can signal
ClassNotFoundException
. Similarly, ObjectOuputStream
is a subclass of OutputStream
that implements the
interface ObjectOutput
which defines writeObject
.
writeObject
can signal
IOException
or several of its descendants such as InvalidClassException
and
NotSerializableException
. Instances of any
serializable class can be used as the
argument of ObjectOutputStream.writeObject
and as the
result of ObjectInputStream.readObject
.
Both classes also define the method reset
which
resets the stream’s object cache, i.e. the stored serial numbers.
In particular, if an object output stream is reset and the client
writes
an object (possibly indirectly) that has already been written, another
copy is written, which is used for subsequent references to that
object.
2.2 Object output and input
Suppose that the variable appts
refers to a hash map in
which the keys are dates and the values are strings. We can write
the map to a file as follows:
// writing an object to a file
try {
ObjectOutputStream outStr = new ObjectOutputStream(new FileOutputStream(“appointments.ser”));
outStr.writeObject(appts);
outStr.flush();
outStr.close();
}
catch(IOException ex) {
System.out.println(ex.getMessage());
}
By convention, the filename extension ser
is used
for serialized object files. This simple technique works because
the classes HashMap
, Date
, and String
are serializable. Like a filter output stream, the ObjectOutputStream
constructor takes the destination for the bytes. We can “wrap” an
ObjectOutputStream
around a stream attached to a socket or
any other destination.
To read the hash map from the file back into memory is just as simple:
// reading an object from a file
Map appts = null;
try {
ObjectInputStream inStr = new ObjectInputStream(new FileInputStream(“appointments.ser”));
appts = (HashMap) inStr.readObject();
inStr.close();
}
catch(IOException ex) {
System.out.println(ex.getMessage());
}
The cast to HashMap
is necessary because the return
type of readObject
must be Object
to
accomodate all classes. Note that if the same string object had
been associated with more than one key in the original hash table
written to the file, that relationship would be preserved in the object
read from the file. Like a filter input stream, the
ObjectInputStream
constructor
takes the source of the bytes. We can “wrap” an
ObjectInputStream
around a stream attached to a socket or
any
other source. Note also that constructors are not used for
deserialization: if there are validations or calculations
in a class's constructor that must be done when creating an instance,
you can override readObject
, as described in the next
section.
3. Writing Serializable classes
3.1 Using default serialization
To use default serialization, a class implements Serializable
or extends a serializable class. If a class's superclass is not
serializable, it can still implement Serializable
if the
superclass has a no-argument constructor. We will see that a
class must be serializable for it to be used as the parameter or return
type of a remote method.
If an instance variable should not be serialized, mark it as
transient
. For example, we would declare an instance
variable transient
if its type is not serializable,
or its value depends on run-time conditions or can be computed from
other information in the object. Recent revisions to the class
library provide more control over which fields are serialized via the
"Serializable Fields API". We will not discuss this facility here.
3.2 Customizing serialization
If a class has non-serializable superclasses or instance variables, or
requires more efficient serialization methods or other special
processing, the class can implement its own serialization by defining
the following methods:
private void readObject(ObjectInputStream) throws IOException, ClassNotFoundException
private void writeObject(ObjectOutputStream) throws IOException
(In fact, HashMap
defines these methods so that the
empty buckets are not serialized.) The methods readObject
and writeObject
are invoked by ObjectInputStream
and ObjectOutputStream
methods, respectively. The
implementations of these methods call defaultReadObject
and defaultWriteObject
to use default serialization for
the class's non-transient fields, which are sent to the stream
argument and have no arguments. The methods can transfer
additional bytes using DataInput
and
DataOutput
methods (as well as read
and
write
). The readObject
and
writeObject
methods for a class
must read and write additional variables in the same order. Since
readObject
and writeObject
are private,
a class cannot refine its superclass's methods. However, when its
methods invokes defaultReadObject
and defaultWriteObject
, they call the superclass readObject
or writeObject
methods. (The fact that these methods are private also prevents
them from being declared in the interface Serializable
.) With serialization, you should use readObject
and writeObject
rather than DataInput.readUTF
and DataOutput.writeUTF
for strings.
Suppose we have a class User
that maintains the user's
password in an instance variable (as well as other information about a
user), and we do not want to store the password or send it over a
network without encoding it. The following example demonstrates how to
customize the serialization mechanism to achieve this:
public class User implements Serializable {
protected String name;
protected transient String password;
// ... other serializable instance variables ...
private void readObject(ObjectInputStream inStr) throws IOException, ClassNotFoundException {
inStr.defaultReadObject();
password = decode((String) inStr.readObject());
}
private void writeObject(ObjectOutputStream outStr) throws IOException {
outStr.defaultWriteObject();
outStr.writeObject(encode(password));
}
// ... other methods (including encode and decode) ...
}
The variable password
is marked transient
so that the default mechanism does not serialize its value. The
class defines readObject
to call defaultReadObject
to serialize the values for all other instance variables and handle the
object's class identity, and to use its private decode
method when deserializing the value for the password
variable. The writeObject
method performs the
corresponding operations in the same order. Note that the methods
for readObject
and writeObject
do not
handle the exceptions that can occur, but
propagate them to the caller.
As another example, suppose a class has an instance variable of type
Image
, which is not serializable. The class's
writeObject
method can use an instance of PixelGrabber
to covert the image to an int[]
, which can then be
written to the stream with a writeInt
loop (the width and
height also are written using writeInt
).
The readObject
method reads the int[]
and passes it to a MemoryImageSource
constructor, and
then passes that object to Component.createImage
to create the image.
3.3 The interface Externalizable
The designer of a class can take complete control of serializing its
instances by implementing Externalizable
, a sub-interface
of Serializable
. An externalizable class defines the
following methods:
public void readExternal(ObjectInput) throws IOException, ClassNotFoundException
public void writeExternal(ObjectOutput) throws IOException
When an object whose class implements Externalizable
is serialized, these methods are called rather than the default
serialization or readObject
and writeObject
.
The readExternal
method can use readObject
and the DataInput
methods, and similarly for writeExternal
. The methods must handle all the details of encoding instances
in bytes and decoding them from bytes, including state information
inherited from ancestors. If class versioning is necessary (see
the next section), these methods must implement it.
3.4 Class versioning
To establish an object’s class identity, the default serialization
mechanism writes a “class descriptor” that identifies the class and
its version. This descriptor is an instance of ObjectStreamClass
that includes the qualified class name, an SHA-1 hash of the class’s
name and its ancestor and non-private component names (referred to as
the “serial
version unique identifier” or “serial version UID”), the names and
types of instance variables serialized by the default mechanism, and
whether the class defines readObject
and writeObject
methods. (That is, an ObjectStreamClass
is used
rather than the class name or the serialized class object.) If
during
deserialization, the information in the stream is different from that
in the version of the class loaded in the recipient Virtual Machine, an
InvalidClassException
is signaled. This ensures
that the class used in the deserializing Virtual Machine is the same as
that used in the serializing Virtual Machine.
Class versioning can result in problems when a class is under
development. If an object is written to a file and then the class
is modified, the serial version UID of the class in the deserializing
Virtual Machine can differ from that of the written object, preventing
the object from being read. Even if the developer has added
non-private methods or
changed method names since writing an instance, that object's serial
version UID
will not match that of the class in the reading virtual machine, though
the object's fields are the same. That is,
the default mechanism errs on the safe side in that essentially any
change
in the class definition results in a different serial version UID.
To
avoid a serialization exception while a class is under development, the
class defines
a private
long
final
class
variable called serialVersionUID
(it must be private).
In this case, the Virtual Machine will use that serial version
UID value rather than generating one for the class as described above.
To obtain a value for the variable, use the JDK program
serialver
or call the
method ObjectStreamClass.lookup(YourClass.class).getSerialVersionUID()
.
If the serial version unique ID of the class in the
deserializing Virtual Machine is the same as that in the stream, no
exception is thrown. If the object in the stream has values for
instance variables that
do not exist in the recipient's class, they are ignored. If the
recipient has variables (including inherited fields) that do not exist
in the object
written to the stream, they are initialized to the default values for
their types (null
for class-type variables).
In fact,
it is considered good practice for classes to define serialVersionUID
so that the
developer
can control versioning, and because the SHA computation is
time-consuming.
When a new version of the class is developed that should not be
compatible with earlier version, it is given a new serial version UID.