Mega Code Archive

Binary Serialization with the Microsoft .NET framework and Delphi for .NET

Title: Binary Serialization with the Microsoft .NET framework and Delphi for .NET Question: This article explains binary serialization, complete with Delphi for .NET source code. Answer: Binary Serialization with the Microsoft .NET framework and Delphi for .NET This article explains binary serialization, complete with Delphi for .NET and C# source code Binary Serialization Serialization Serialization is the process of transforming object state into a form that can readily be transported or persisted. Object state herein refers to the fields, or member variables, of an object that define the object's state. These fields can be of any type, ranging from simple types such as integers, to complex types such as objects. Because contained objects partially define the state of the object, serialization might mean transforming a whole chain of objects, referred to as an object graph. The form into which the object graph is transformed is usually of no concern, though it must be precisely defined. Of more concern is where the result of this transformation is stored. With .NET serialization the destination of the transformed bytes is always a Stream. A Stream is an abstract class that defines only a behavioral pattern for storing and retrieving data, but not the underlying storage method or location itself. Concrete descendents of the stream class define the method of storage that is used. The .NET Framework Class Library (FCL) provides stream classes that store data in memory (System.IO.MemoryStream), in a file (System.IO.FileStream) and even send it across the network (System.NET.Sockets.NetworkStream). By using streams, .NET serialization decouples the process of serialization and the destination of the serialized data. This decoupling is part of the reason why .NET serialization is so flexible. It doesn't matter to the serialization framework what you do with the serialized data. For all it cares, you persist it to disk, transport it across application domain boundaries, or even across the network. Deserialization Deserialization is the process of reconstructing the object graph from the stream, which is just the inverse of serialization. It means reading the stream, creating the objects found therein and populating the fields of these objects with the values found in the stream. That's all nice and well, but why would you want to do such a thing in the first place? Well, there are many uses for serialization. Here are just a few examples of things that are, or can be, implemented using serialization: * Copying objects to and from the clipboard in Windows Forms applications * Saving session state in ASP.NET (when using the state server or SQL server model) * Remoting objects across application domain boundaries * Persisting application state and configuration to disk * Distributed or disconnected applications that need to share objects The serialization services offered by the .NET Framework are extremely powerful, easy to use and flexible. With only a few lines of code you can serialize a complete object graph to multiple locations. The .NET Framework automatically serializes all object fields, both private and public, and correctly handles complete object graphs including those that contain circular references. Because serialization is based on reflection, whenever you add fields to your objects, those fields will be automatically serialized as well, without any additional coding. Thus, .NET serialization takes care of one of the most common problems experienced with manually crafted solutions. Of course there are situations when the default serialization mechanisms just won't do. In those cases you can take over serialization completely. We'll address how to do that later. First, let's start with the basics. NOTE: The .NET Framework provides two types of serialization, Binary serialization and XML serialization. While similar in nature, the two are different technologies. This article only addresses binary serialization, not XML serialization. As you'll see later, although binary serialization allows you to serialize objects in XML format, it's still binary serialization. Basic serialization Without further ado, Program 1 demonstrates the code required to serialize a simple object to a file, and deserialize the object back from the same file, effectively cloning the original object. program Program1; {$APPTYPE CONSOLE} uses System.IO, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Binary; type [Serializable] SerializableObject = class private Field: Integer; end; procedure SerializeObject(Obj: SerializableObject); var Formatter: IFormatter; Stream: System.IO.FileStream; begin Stream := FileStream.Create('Program1.txt', FileMode.Create); Formatter := BinaryFormatter.Create as IFormatter; Formatter.Serialize(Stream, Obj); Stream.Close; end; function DeserializeObject: SerializableObject; var Formatter: IFormatter; Stream: System.IO.FileStream; begin Stream := FileStream.Create('Program1.txt', FileMode.Open); Formatter := BinaryFormatter.Create as IFormatter; Result := SerializableObject(Formatter.Deserialize(Stream)); Stream.Close; end; var SO1, SO2: SerializableObject; begin SO1 := SerializableObject.Create; SO1.Field := 1; SerializeObject(SO1); SO2 := DeserializeObject; Console.WriteLine(System.Int32(SO2.Field)); end. In this code we have a class named SerializableObject that has one Integer typed member named Field. This class has the Serializable custom attribute attached to it. This is the first and only requirement for .NET serialization to work. Any type that needs to be serializable must be marked with the Serializable attribute. If you forget to include this attribute, the framework will throw an exception (System.Runtime.Serialization.SerializationException) when you attempt to serialize the object. The Serializable attribute can be attached to classes, value types (records), enumerations and delegates. The latter two are serializable by default so you don't have to explicitly mark them as such. The Serializable attribute is not inheritable. On the other hand, however, the serialization framework does require that all classes in a hierarchy be marked (explicitly) with the Serializable attribute. This means that if your class inherits from a class that is not marked as Serializable, your class won't be serializable either, regardless of whether you mark the derived class with the attribute. Any attempt to serialize an instance of the derived class will result in a SerializationException thrown by the framework. Later in this article we'll describe a solution to overcome this. As a final note, the Serializable attribute is actually a pseudo attribute. That is, unlike regular attributes that are stored in full in the assembly metadata, Serializable causes only a specific bit to be set (see System.Reflection.TypeAttributes for more detail and the other possible flags). This is not something you'll have to worry about, as it is completely transparent. To continue with the example code, we create an instance of our class and initialize its sole member. We then call the SerializeObject function to serialize the object to a file. SerializeObject creates a FileStream and a BinaryFormatter object and serializes the object to the file by calling the Serialize method on the Formatter object. This will result in a file being created, named Program1.txt, that contains the serialized representation of the object. Next the DeserializeObject function is called to deserialize the object from the stream. The process is very similar; we create a BinaryFormatter and FileStream object and call the Deserialize method of the Formatter. This results in the Formatter creating a new SerializableObject instance and populating its field with the value read from the stream. Finally we write the value of the field to the console to show the deserialization succeeded. In this example the code to serialize and deserialize the object was intentionally separated into two distinct functions to underscore the fact that these are completely separate and independent processes. You could run the serialization code in one instance of your application, and the deserialization in another, for example. NOTE The Serializable attribute also affects how objects are marshalled across appdomain boundaries using the remoting features of the .NET framework. An object marked Serializable is automatically marshalled by value. An object that inherits from MarshalByRefObject is marshalled by reference. Finally, an object that inherits from MarshalByRefObject and is also marked Serializable will cause a proxy to be created in the target appdomain. Formatters Serialization and deserialization is performed by an object that implements the IFormatter interface. The .NET Framework Class Library (FCL) provides two implementations of this interface, BinaryFormatter and SoapFormatter. These classes differ only in the format in which the serialized data is transformed (hence the name Formatter). BinaryFormatter uses an efficient, compact but proprietary format while SoapFormatter uses XML/SOAP. Which formatter you use in your application depends on your requirements. For example, if supporting multiple platforms is important you would probably want to stick with SoapFormatter. On the other hand, if you have large objects that must be transported across the network, where size does matter, the BinaryFormatter would be more appropriate. Either way, and this should be needless to say; you must use the same formatter implementation for both serialization and deserialization. A BinaryFormatter can't deserialize an object that was serialized with a SoapFormatter or vice versa. TIP: One additional benefit of using the SoapFormatter is that the result is readable, as far as XML can be considered readable. As such it can be very useful to use SoapFormatter during development and as a debugging aid. To switch between BinaryFormatter and SoapFormatter you only need to change a single line of code, which can even be done using conditional compilation based on an imaginary DEBUG symbol. The IFormatter interface defines a bunch of properties that we'll ignore for now, and two methods, Serialize and Deserialize. The Serialize method takes the object graph and writes their state, including type identity to the stream. The Deserialize method reads the stream and reconstructs the object graph from it. The return value of Deserialize is a reference to the root of the object graph. Even though formatter objects are at the center of the serialization process, most of the process is actually delegated to different classes and/or inherited from the Formatter base class. In fact, the formatter classes need to do little more than convert name/value pairs to a specific format and vice versa. Helper classes implement everything else, such as determination of the fields to serialize, traversal of the object graph, and so forth. The most important helper classes are FormatterServices and ObjectManager. As a user of the serialization framework you'll almost never deal with these classes, and describing them in detail would require another article. In a few specific cases where you do need to use FormatterServices we'll show the required code without any details. If you want to know more about these classes you can look them up in the .NET Framework documentation or see [2], which contains a relatively good description. In addition to name-value pairs for each of the objects fields, formatters also write the name of the type and the name of the assembly in which the type is defined to the stream. This information is necessary during deserialization to construct the correct object from the right assembly. By default, both the BinaryFormatter and SoapFormatter write out the full name of the assembly, including its filename, version, culture and public key token. You can change this behavior by using the AssemblyFormat property of the formatter object. Both BinaryFormatter and SoapFormatter support this property but the IFormatter interface doesn't. This is the only time when you actually need a reference to the Formatter object as opposed to an IFormatter interface. The AssemblyFormat can be set to either FormatterAssemblyStyle.Full or FormatterAssemblyStyle.Simple. Full is the default and results in the behavior described above. If AssemblyFormat is set to Simple, the formatter doesn't write out the full assembly name but only the assembly filename. A closer look at deserialization During deserialization the formatter reads the incoming stream and examines each serialized object in turn. For each object found the formatter first ensures that the assembly in which the object is contained is loaded and then constructs an instance of the object. The object is not constructed the normal way, by calling its constructor, but instead by a call to FormatterServices.GetUninitializedObject. This method allocates memory for the object and initializes internal metadata but does not call the objects constructor. After the object is constructed the formatter populates the objects fields with the values found in the stream, in a simple scenario this is done through FormatterServices.PopulateObjectMembers). Deserialization occurs in several stages depending on the complexity of the serialized objects (single object vs. object graph, circular references, serialization callbacks, etc). The objects are first read from the stream and put in an internal list; simple objects are constructed and have their fields populated immediately, but complex object graphs may not be fully initialized at this point. After all objects are read from the stream, the framework walks the list as many times as necessary to fully construct and initialize the deserialized objects. Why this is necessary will become clear in the remainder of this article when we describe all the various means by which serialization and deserialization can be customized. In short, here are the steps that occur for each object found in the stream (you'll learn about ISerializable later): Read assembly identity and type name (and load the assembly if needed) Type := FormatterServices.GetTypeFromAssembly Obj := FormatterServices.GetUninitializedObject(T) if T implements ISerializable then call special constructor else Fields := FormatterServices.GetSerializableMembers Read fieldname/value pairs and create an array holding the values (Data) FormatterServices.PopulateObjectMembers(Obj, Fields, Data) NOTE: Even though instance constructors aren't called during deserialization, class constructors are called as normal. That is, the class constructor is called the first time the class is referenced, regardless of whether this is a result of deserialization or the application itself referencing the class (e.g. constructing an instance of it). Selective serialization By default, if a class is marked Serializable, all of its fields will be serialized. This may not be desirable for a number of reasons. For example, some fields may contain values that can't be guaranteed to remain valid after a serialization-deserialization cycle. Examples of such values are Win32 handles and thread Ids. Other reasons for not wanting to serialize a field is if the field either contains sensitive information (e.g. a password) or contains a large object that can easily be reconstructed. In the latter case you may not want to serialize this particular field because the required disk space, or worse network bandwidth, would be prohibitive. You can prevent any field from being serialized by marking it with the NonSerialized attribute. Program 2 demonstrates this. program Program2; {$APPTYPE CONSOLE} uses System.IO, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Binary; type [Serializable] SerializableObject = class private Field1: Integer; [NonSerialized] Field2: Integer; end; var O: SerializableObject; Formatter: IFormatter; Stream: System.IO.MemoryStream; begin O := SerializableObject.Create; O.Field1 := 1; O.Field2 := 2; Stream := MemoryStream.Create; Formatter := BinaryFormatter.Create as IFormatter; Formatter.Serialize(Stream, O); Stream.Seek(0, SeekOrigin.Begin); O := SerializableObject(Formatter.Deserialize(Stream)); Console.WriteLine(System.Int32(O.Field1)); Console.WriteLine(System.Int32(O.Field2)); end. In this code we have a class marked Serializable that contains two members. The second member is marked with the NonSerializable attribute and therefore won't be serialized. Notable about this example is that the last line, which writes out the deserialized value of Field2, displays 0. The reason for this is that when the object is deserialized, all fields are set to their default value before they are populated with the values from the stream. Members that weren't serialized will remain at that default value. The default value for an Integer happens to be zero. You have to keep this in mind when using the NonSerializable attribute and probably want to access such fields using an accessor method that checks the field against the types default value. Or alternatively, you may want to implement a deserialization callback that initializes those fields automatically (deserialization callbacks are discussed later). Custom Serialization The ability to choose which fields to serialize, using the NonSerializable attribute, probably covers 90% of your serialization needs. For the remaining 10% the FCL provides you with the means to take complete control over the serialization process. You do this by implementing the ISerializable interface and providing a complementary deserialization constructor. The ISerializable interface has only one method, GetObjectData. Normally when serializing an object, the FCL uses reflection to determine which fields, and their values, to serialize. When the FCL sees that the class implements ISerializable it instead calls the GetObjectData method during serialization and the special constructor during deserialization. Both GetObjectData and the special constructor accept two parameters, a reference to a SerializationInfo and a StreamingContext object. We'll ignore the StreamingContext for now and focus on SerializationInfo. SerializationInfo has methods that allow you to store name/value pairs in the target stream during serialization and to read them back during deserialization. To add values to the target stream you use the AddValue method, which exists in numerous overloaded forms covering virtually all types ranging from Int32 to String. There's even an AddValue method that accepts an object reference so that you can serialize members that reference an object (the referenced object must itself be serializable). To read values from the stream during deserialization you use the various Get methods such as GetInt32 and GetString. Program 3 demonstrates implementing ISerializable and the special deserialization constructor. program Program3; {$APPTYPE CONSOLE} uses System.IO, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Binary; type [Serializable] SerializableObject = class (System.Object, ISerializable) private Field1: Integer; Field2: string; protected constructor Create(Info: SerializationInfo; Context: StreamingContext); public procedure GetObjectData(Info: SerializationInfo; Context: StreamingContext); end; constructor SerializableObject.Create(Info: SerializationInfo; Context: StreamingContext); begin inherited Create{(Info, Context)}; Field1 := Info.GetInt32('Field1'); Field2 := Info.GetString('Field2'); end; procedure SerializableObject.GetObjectData(Info: SerializationInfo; Context: StreamingContext); begin //inherited GetObjectData(Info, Context); Info.AddValue('Field1', Field1); Info.AddValue('Field2', Field2); end; var O: SerializableObject; Formatter: IFormatter; Stream: System.IO.MemoryStream; begin O := SerializableObject.Create; O.Field1 := 1; O.Field2 := 'Hello'; Stream := MemoryStream.Create; Formatter := BinaryFormatter.Create as IFormatter; Formatter.Serialize(Stream, O); Stream.Seek(0, SeekOrigin.Begin); O := SerializableObject(Formatter.Deserialize(Stream)); Console.WriteLine(System.Int32(O.Field1)); Console.WriteLine(O.Field2); end. A few things you should take note of: First, the class must still be marked with the Serializable attribute. If you omit it, the FCL will raise an exception. Second, if the class inherits from a class that also implements ISerializable, you should call the inherited GetObjectData method during serialization, and the inherited constructor during deserialization. In this example neither is necessary (illegal even) since the class doesn't inherit from such a class. At this point you might be wondering why the ISerializable interface doesn't have a SetObjectData method but instead you have to define a special constructor. The reasoning behind this is related to security and versioning. You see, if it was done that way it would be possible for client code to explicitly call the SetObjectData method, thereby indirectly changing the private fields of the object. You definitely wouldn't want that, especially since it would be almost impossible to distinguish between valid and spurious calls to SetObjectData. The special constructor on the other hand can be declared private or protected (as was done in the example code) making it difficult for client code to call it. The FCL on the other hand still has access to it through some magic. The general guideline is that if your class is sealed, you should declare the constructor as private, and if it is not sealed you should declare it as protected (so that derived classes can still call it). There are two cases left to consider for completeness. The first case is where you derive from a class that implements ISerializable. In this case the derived class itself must implement ISerializable as well (and call the inherited methods). You can't just mark it as Serializable because even though the code will compile and run just fine, you'll find that the fields of the base class won't be serialized properly. I'm not sure if this should be considered a bug or a feature. The second case is where you want to derive from a class that doesn't implement ISerializable (but is marked Serializable), yet you do want to implement it in your derived class. In this case you have a little extra work to do because the base class members won't be serialized automatically. The code is straightforward though and is shown in Program 4. In this code we use the static GetSerializableMembers method of the FormatterServices class to obtain an array of serializable fields (those not marked NonSerializable) for the type and its base class. We then loop through this array looking for those fields that are declared in the base class and explicitly write those to the stream during serialization and populate them from the stream during deserialization (using reflection). program Program4; {$APPTYPE CONSOLE} uses System.IO, System.Reflection, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Soap; type [Serializable] BaseObject = class (System.Object) private Field1: Integer; end; [Serializable] DerivedObject = class (BaseObject, ISerializable) private Field2: Integer; protected constructor Create(Info: SerializationInfo; Context: StreamingContext); public procedure GetObjectData(Info: SerializationInfo; Context: StreamingContext); end; constructor DerivedObject.Create(Info: SerializationInfo; Context: StreamingContext); var ParentType: System.Type; Field: FieldInfo; Members: array of MemberInfo; I: Integer; begin inherited Create; ParentType := GetType; Members := FormatterServices.GetSerializableMembers(ParentType, Context); for I := 0 to Length(Members) - 1 do begin Field := FieldInfo(Members[I]); if Field.DeclaringType ParentType then begin Field.SetValue(Self, Info.GetValue(Field.Name, Field.FieldType)); end; end; Field2 := Info.GetInt32('Field2'); end; procedure DerivedObject.GetObjectData(Info: SerializationInfo; Context: StreamingContext); var ParentType: System.Type; Members: array of MemberInfo; I: Integer; begin Info.AddValue('Field2', Field2); ParentType := GetType; Members := FormatterServices.GetSerializableMembers(ParentType, Context); for I := 0 to Length(Members) - 1 do if Members[I].DeclaringType GetType then Info.AddValue(Members[I].Name, FieldInfo(Members[I]).GetValue(Self)); end; var O: DerivedObject; Formatter: IFormatter; Stream: System.IO.FileStream; begin O := DerivedObject.Create; O.Field1 := 1; O.Field2 := 2; Stream := FileStream.Create('Program4.txt', FileMode.Create); Formatter := SoapFormatter.Create as IFormatter; Formatter.Serialize(Stream, O); Stream.Seek(0, SeekOrigin.Begin); O := DerivedObject(Formatter.Deserialize(Stream)); Console.WriteLine(System.Int32(O.Field1)); Console.WriteLine(System.Int32(O.Field2)); end. Deserialization callbacks The order in which objects in an object graph are deserialized is undefined, or at least something you should not rely on. This might be problematic in situations where an object relies on a referenced object for its own initialization. One way to solve this problem is through deserialization callbacks. Though it has a complex ring to it, it's really very simple. To implement a deserialization callback, you implement the IDeserializationCallback interface. The IDeserializationCallback interface has only one method, OnDeserialization, which is called by the FCL immediately after deserialization is completed for the entire object graph but before the Formatter returns from the Deserialize method. Deserialization callbacks are also a perfect candidate for initializing fields that were marked with the NonSerialized attribute. Program 5 demonstrates implementing IDeserializationCallback. The code isn't very useful but demonstrates the concept. program Program5; {$APPTYPE CONSOLE} uses System.IO, System.Reflection, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Soap; type [Serializable] SerializableObject = class (System.Object, IDeserializationCallback) private Field1: Integer; [NonSerialized] Field2: Integer; public procedure OnDeserialization(Sender: System.Object); end; procedure SerializableObject.OnDeserialization(Sender: System.Object); begin Field2 := 2; end; var O: SerializableObject; Formatter: IFormatter; Stream: System.IO.FileStream; begin O := SerializableObject.Create; O.Field1 := 1; Stream := FileStream.Create('Program5.txt', FileMode.Create); Formatter := SoapFormatter.Create as IFormatter; Formatter.Serialize(Stream, O); Stream.Seek(0, SeekOrigin.Begin); O := SerializableObject(Formatter.Deserialize(Stream)); Console.WriteLine(System.Int32(O.Field1)); Console.WriteLine(System.Int32(O.Field2)); end. Streaming Contexts You can use serialization for many purposes. The serialized data can be stored on disk, transported across the network or used to merely clone an object within the same process. The framework itself is mostly oblivious to this fact, yet there are many situations in which the details of how serialization is performed should adapt to the purpose. For example, suppose you serialize an object that has a field containing a Win32 file handle. When the object is deserialized within the same process, the handle value could be serialized and deserialized as is (you'd probably want to call DuplicateHandle before or after serialization though). On the other hand, if the deserialization happens in a different process the handle value loses its meaning and you will want to serialize the filename instead and recreate the handle in the target process. Streaming contexts make this possible. In essence a StreamingContext is a class that allows you to specify the source and destination of a serialized stream. You assign an instance of this class to the formatter's Context property. The object being serialized has access to the StreamingContext, which is passed as a parameter to the Serialize method and special constructor by the formatter, and can adapt the way it performs (de)serialization based on the context. Program 6 demonstrates using streaming contexts for the example cited above. You should create a dummy file named c:test.txt and run the program twice, the first time as is and the second time after changing the StreamingContext to something different than Clone. For more information about the possible values of the StreamingContext lookup StreamingContextStates in the .NET Framework SDK documentation. NOTE: StreamingContext itself is not automatically serialized. You have to take care to use the same context during serialization and deserialization. Usually this is not a problem, but in those rare cases where the context might be unknown during deserialization, you can always write the context to the stream yourself. For completeness, in addition to specifying the source and target of serialization, the StreamingContext class also allows you to specify additional context through the Context property. Personally I haven't found a use for this yet, but it's nice to know it's there just in case. Note that the object is not automatically written to the serialized data stream either, if you need it for deserialization you'll have to write it to the stream yourself. program Program6; {$APPTYPE CONSOLE} uses Borland.Win32.Windows, System.IO, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Binary; type [Serializable] FileObject = class (System.Object, ISerializable) private FFileName: string; FFileHandle: Integer; protected constructor Create(Info: SerializationInfo; Context: StreamingContext); public constructor Create(const FileName: string); procedure GetObjectData(Info: SerializationInfo; Context: StreamingContext); end; constructor FileObject.Create(const FileName: string); begin inherited Create; FFileName := FileName; FFileHandle := CreateFile(FFileName, GENERIC_READ, 0, nil, OPEN_EXISTING, 0, 0); end; constructor FileObject.Create(Info: SerializationInfo; Context: StreamingContext); begin inherited Create; if Context.State = StreamingContextStates.Clone then begin Console.WriteLine('Deserializing handle'); FFileHandle := Info.GetInt32('FileHandle'); end else begin Console.WriteLine('Deserializing name'); FFileName := Info.GetString('FileName'); FFileHandle := CreateFile(FFileName, GENERIC_READ, 0, nil, OPEN_EXISTING, 0, 0); end; end; procedure FileObject.GetObjectData(Info: SerializationInfo; Context: StreamingContext); begin if Context.State = StreamingContextStates.Clone then begin Console.WriteLine('Serializing handle'); Info.AddValue('FileHandle', FFileHandle); end else begin Console.WriteLine('Serializing name'); Info.AddValue('FileName', FFileName); end; end; var FO: FileObject; Formatter: IFormatter; Stream: System.IO.MemoryStream; begin FO := FileObject.Create('c:\test.txt'); Stream := MemoryStream.Create; Formatter := BinaryFormatter.Create as IFormatter; Formatter.Context := StreamingContext.Create(StreamingContextStates.Clone); Formatter.Serialize(Stream, FO); Stream.Seek(0, SeekOrigin.Begin); FO := FileObject(Formatter.Deserialize(Stream)); Console.WriteLine(System.Int32(FO.FFileHandle)); Console.WriteLine(FO.FFileName); end. Serialization Surrogates and Surrogate Selectors Up to now we've looked at how classes are serialized automatically and how a class can override the way it is serialized through the NonSerialized attribute or by implementing the ISerializable interface. The framework, however, also provides the ability for applications to override how types are serialized. This feature is supported through so called "serialization surrogates." A serialization surrogate is a class that implements the ISerializationSurrogate interface and is registered with the formatter for one or more specific types. Whenever the formatter is about to serialize an instance of that type, it calls upon the surrogate object to perform the serialization. There aren't many situations in which serialization surrogates are used; the most common is when an application wishes to serialize an object from a 3rd party library that wasn't designed to be serializable (that is, not marked with the Serializable attribute). Their applicability is further minimized by the fact that usually a surrogate doesn't have intricate knowledge of the object, nor does it have access to the private fields that make up the objects state. The latter can be overcome in part by reading the fields through reflection but it's not an ideal situation. Let's look at some example code to make all of this concrete. Program 7 illustrates how you can use a serialization surrogate to serialize an object that in actual fact is not serializable itself. program Program7; {$APPTYPE CONSOLE} uses System.IO, System.Reflection, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Soap; type NonSerializable = class private Field: Integer; end; Surrogate = class (System.Object, ISerializationSurrogate) public procedure GetObjectData(Obj: System.Object; Info: SerializationInfo; Context: StreamingContext); function SetObjectData(Obj: System.Object; Info: SerializationInfo; Context: StreamingContext; Selector: ISurrogateSelector): System.Object; end; procedure Surrogate.GetObjectData(Obj: System.Object; Info: SerializationInfo; Context: StreamingContext); var Fields: array of FieldInfo; I: Integer; begin //The following code is equivalent to: // Info.AddValue('Field', NonSerializable(Obj).Field); Fields := Obj.GetType.GetFields(BindingFlags.Instance or BindingFlags.NonPublic); for I := 0 to Length(Fields) - 1 do begin Info.AddValue(Fields[I].Name, FieldInfo(Fields[I]).GetValue(Obj), Fields[I].FieldType); end; end; function Surrogate.SetObjectData(Obj: System.Object; Info: SerializationInfo; Context: StreamingContext; Selector: ISurrogateSelector): System.Object; var Enum: SerializationInfoEnumerator; Field: FieldInfo; begin //The following code is equivalent to: // NonSerializable(Obj).Field := Info.GetInt32('Field'); Enum := Info.GetEnumerator; while Enum.MoveNext do begin Field := Obj.GetType.GetField(Enum.Name, BindingFlags.Instance or BindingFlags.NonPublic); Field.SetValue(Obj, Convert.ChangeType(Enum.Value, Field.FieldType)); end; Result := nil; end; function CloneObject(const O: System.Object): System.Object; var Formatter: IFormatter; Stream: System.IO.Stream; Selector: SurrogateSelector; Surr: Surrogate; begin Stream := FileStream.Create('Program8.txt', FileMode.Create); Formatter := SoapFormatter.Create as IFormatter; Selector := SurrogateSelector.Create; Surr := Surrogate.Create; Selector.AddSurrogate(TypeOf(NonSerializable), StreamingContext.Create(StreamingContextStates.All), Surr); Formatter.SurrogateSelector := Selector as ISurrogateSelector; Formatter.Serialize(Stream, O); Stream.Seek(0, SeekOrigin.Begin); Result := Formatter.Deserialize(Stream); end; var NS1, NS2: NonSerializable; begin NS1 := NonSerializable.Create; NS1.Field := 2; NS2 := NonSerializable(CloneObject(NS1)); Console.WriteLine(System.Int32(NS2.Field)); end. In the example code we have a NonSerializable class that itself is not serializable. We define a Surrogate class that implements the ISerializationSurrogate interface. This interface has only two methods, GetObjectData and SetObjectData. The framework calls the GetObjectData method when a NonSerializable object needs to be serialized. The arguments of GetObjectData are identical to the same named method of the ISerializable interface with the addition of the Obj argument, which is a reference to the NonSerializable instance that needs to be serialized. The SetObjectData method is the counterpart of GetObjectData and is called during deserialization (it's the equivalent of the special constructor we talked about earlier). Except for the last one, Selector (which we'll describe later), the arguments are identical to those of GetObjectData. In SetObjectData, Obj is a reference to an uninitialized object allocated by the framework. As with ISerializable, this object hasn't had any of its constructors called and it's the surrogate's job to populate the objects fields. NOTE: As you can see in Program 8, nil is returned from the SetObjectData method. According to the .NET Framework documentation you are supposed to return a reference to the initialized object from this method. This feat would allow you to actually return a different object than was supplied through the Obj argument. In actual fact, however, the framework currently ignores the result of SetObjectData. According to Richter [1] this bug is scheduled to be fixed in a future release. The implementation of both methods is relatively straightforward. For those of you who are not familiar with reflection, see the code for GetObjectData, which uses reflection to discover the private instance fields of the object and writes them to the stream. In SetObjectData we use a SerializationInfoEnumerator in combination with reflection to write all serialized values from the stream back to the private instance fields of the object. In our case we only have a single private instance field and we actually have direct access to that field, so we could have used the two lines that are commented out in the Program instead. The general case is different, however. Assuming the appropriate permissions are granted, the code shown will almost always work. We need to register the Surrogate with the Formatter to tell the framework that our Surrogate object should be called to handle the serialization and deserialization of the NonSerializable class . You do so in two steps as show in the CloneObject function. First, a SerializationSelector and a Surrogate object are created and the Surrogate is added to the list of surrogates through the selectors AddSurrogate method. Second, the selector is assigned to the SurrogateSelector property of the Formatter. This indirection through a surrogate selector object is for two reasons. First, the surrogate selector is capable of storing a list of surrogates and allows an application to register multiple surrogates for different types. In fact, you can register multiple surrogates for the same type but with different streaming contexts. This means you can have a different surrogate handle serialization depending on the streaming context. Second, the selector object also implements the logic to choose the right surrogate at runtime thereby relieving a Formatter implementation from having to do so. A beneficial side effect is that you can also implement your own surrogate selector and use that instead to gain the ultimate flexibility in how surrogates are chosen at runtime (though it's unlikely you'll ever need to do this). Another nice application of serialization surrogates is to augment a serialized object. That is, instead of completely replacing how a type is serialized, you can add additional information to the serialized stream. An example of such an augmenting serialization surrogate can be found in the AugmentSurrogate.dpr file accompanying this article. The code is rather complex and a full explanation is unfortunately beyond the scope of this article. For details see [2], which is where I got the example from originally. Serialization Binders When a type is serialized, the formatter also writes the type's identity to the stream. The identity consists of the full type name and the fully qualified assembly name of the assembly the type resides in. During deserialization, the type identity is used to construct the correct type from the correct assembly. Some times, however, it's necessary to override the exact type that gets constructed during deserialization. One example of when this is necessary is when you've moved the type implementation to another assembly. Another example is when you've changed the type, say you've created a version 2 of the same type, and you want to be able to deserialize version 1 types. A serialization binder is an object that inherits from the abstract SerializationBinder base class and implements its only method, BindToType. During deserialization, the framework calls the BindToType method for each object found in the stream. The BindToType method examines the type of this object and returns the type of object that should be constructed instead. If the binder doesn't want to change the type (binders usually change only a few specific types and ignore all others) it can simply return nil and the original type as found in the stream will be constructed. In Program 8 code is shown that serializes a Version1 class and deserializes it as a Version2 class. This is of course accomplished using a binder object. Most of this code should be self-explanatory so I won't discuss it any further. program Program8; {$APPTYPE CONSOLE} uses System.IO, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Soap; type [Serializable] Version1 = class Field: Integer; end; [Serializable] Version2 = class Field: Integer; end; UpgradeBinder = class (SerializationBinder) public constructor Create; function BindToType(AssemblyName, TypeName: string): System.Type; override; end; constructor UpgradeBinder.Create; begin inherited; end; function UpgradeBinder.BindToType(AssemblyName, TypeName: string): System.Type; begin if TypeName = 'Program8.Version1' then Result := System.Type.GetType( System.String.Format('Program8.Version2, {0}', AssemblyName)) else Result := nil; end; function CloneObject(const O: System.Object): System.Object; var Formatter: IFormatter; Stream: System.IO.FileStream; begin Stream := FileStream.Create('Program8.txt', FileMode.Create); Formatter := SoapFormatter.Create; Formatter.Binder := UpgradeBinder.Create; Formatter.Serialize(Stream, O); Stream.Seek(0, SeekOrigin.Begin); Result := Formatter.Deserialize(Stream); end; var V1: Version1; V2: Version2; begin V1 := Version1.Create; V1.Field := 1; V2 := Version2(CloneObject(V1)); Console.WriteLine(V2.GetType.FullName); end. There are a few things you should keep in mind about serialization binders. First, the serialization binder is called for each type found in the stream, even the ones you might not know about. Usually you only want to change one or more specific types so in the BindToType method you must always explicitly test the type for which BindToObject was called. In Program 8 we do this by testing the TypeName against Program8.Version1. If the type under examination is not Program8.Version1, we simply return nil and let the framework continue. Second, the type you return from BindToType must be serializable itself and must have an identical field layout as the original type, or implement ISerializable. In our example this means that the Version2 class must have a single field named Field; it cannot have any other fields or omit this one. The type of fields don't have to match exactly but should be compatible. During deserialization the framework automatically converts between the core primitive types. For other types it queries the original object for an IConvertible interface and uses that, if present, to perform the conversion. If neither of these succeeds an exception is raised. Usually you will want the new type to have others fields as well, and perhaps even ignore the original fields. In that case the new type should implement ISerializable and read the fields from the stream manually through SerializationInfo. Third, the example only changes the type that gets deserialized, but the type is contained within the same assembly. As noted above, there are situations in which you want to change from which assembly the deserialized type gets loaded. This too is relatively straightforward to accomplish. Instead of changing the type name, you change the assembly name and call Type.GetType. For example: function UpgradeBinder.BindToType(AssemblyName, TypeName: string): System.Type; begin //load the same type but from a different assembly (version2assembly) AssemblyName := 'version2assembly, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null'; Result := System.Type.GetType(System.String.Format('{0}, {1}', TypeName, AssemblyName)); end; Object References In the FCL a number of classes exists for which it is guaranteed that only one instance will ever exist, so called singleton classes. Examples of these are Module, Assembly and Runtime. It's likely that you will want to define such singleton classes yourself in your library or application as well. Such classes can't be serialized and deserialized the normal way because that would potentially result in multiple instances of the class (remember, during deserialization the constructor of the class is never called so you can't use it to prevent multiple instantiations). The serialization framework offers a solution for this through so-called object references. Object references are classes that implement the IObjectReference interface and that replace the original class in the serialization process. When the original class is serialized, it tells the framework to serialize the object reference instead. During deserialization the framework notices that an object reference was serialized (by the fact that the serialized object implements IObjectReference) and calls its GetRealObject method. This method in turn performs whatever magic necessary to construct the original object or obtain a reference to it, and returns that reference to the framework. Right, that sounds terribly complex so let's look at the code in Program 9 to demonstrate how easy it really is. program Program9; {$APPTYPE CONSOLE} uses System.IO, System.Runtime.Serialization, System.Runtime.Serialization.Formatters, System.Runtime.Serialization.Formatters.Soap, System.Runtime.Serialization.Formatters.Binary; type [Serializable] Singleton = class (System.Object, ISerializable) private FCreation: DateTime; constructor Create; public procedure GetObjectData(Info: SerializationInfo; Context: StreamingContext); class function GetInstance: Singleton; property Creation: DateTime read FCreation; end; [Serializable] SingletonSerializer = class (System.Object, IObjectReference) public function GetRealObject(Context: StreamingContext): System.Object; end; var SingletonInst: Singleton; constructor Singleton.Create; begin inherited Create; FCreation := DateTime.Now; end; class function Singleton.GetInstance: Singleton; begin if SingletonInst = nil then SingletonInst := Singleton.Create; Result := SingletonInst; end; procedure Singleton.GetObjectData(Info: SerializationInfo; Context: StreamingContext); begin Info.SetType(TypeOf(SingletonSerializer)); end; function SingletonSerializer.GetRealObject(Context: StreamingContext): System.Object; begin Result := Singleton.GetInstance; end; function CloneObject(const O: System.Object): System.Object; var Formatter: IFormatter; Stream: System.IO.FileStream; begin Stream := FileStream.Create('Program7.txt', FileMode.Create); Formatter := SoapFormatter.Create; Formatter.Serialize(Stream, O); Stream.Seek(0, SeekOrigin.Begin); Result := Formatter.Deserialize(Stream); end; var O1, O2: Singleton; begin O1 := Singleton.GetInstance; Console.WriteLine(O1.Creation.ToString); O2 := Singleton(CloneObject(O1)); Console.WriteLine(O2.Creation.ToString); if System.Object.ReferenceEquals(O1, O2) then Console.WriteLine('O1 = O2'); end. In Program 9 we have a Singleton class that contains only a single field that records the timestamp at which the object was created. The Singleton class defines a private constructor to prevent client code form instantiating an instance of the Singleton class directly. Instead, clients must call the GetInstance class method to obtain a reference to the object, which is created during the very first call. The Singleton class implements ISerialization and in its GetObjectData method it calls the SerializationInfo.SetType method passing the type of SingetonSerializer as the sole parameter. This call basically tells the framework that instead of serializing the Singleton class, it should instead serialize a SingletonSerializer. If you run the code and look at the generated Program9.txt file, you'll see that this is exactly what happened. In fact, in the serialized stream, there is no sign of the original Singleton class whatsoever. During deserialization, the framework reads the stream, notices that the class that was serialized implements the IObjectReference interface and calls its GetRealObject method. This method uses the Singleton.GetInstance class method to obtain a reference to the Singleton object and returns it to the framework. The framework subsequently returns this same reference from the IFormatter.Deserialize method. NOTE: If you've look at the code in Program 9 closely you'll notice that even though the Singleton class implements ISerializable, it doesn't implement the special deserialization constructor you've learned about earlier. This is not necessary because the constructor is never called, instead the object reference class (SingletonSerializer) is instantiated and it is responsible for creating the referenced object through whatever means. So far it's simple enough. However, things get slightly counter intuitive when you have more complex requirements. For example, the object reference might need additional information in order to recreate the original object during deserialization. Easy enough you might think, the SingletonSerializer class itself is serialized so I'll just add the required state fields to it as members. Wrong. Even if somehow you could communicate the required state to the SingletonSerializer (the framework internally constructs a SingletonSerializer, you only tell the framework what type should be serialized) your very first test will prove that the fields aren't serialized as you expected. In fact, during deserialization the framework will throw an exception! The solution requires that the original object and the object reference work together. During serialization, the original object (Singleton) writes the required state to the stream in its GetObjectData method. During deserialization, the framework instantiates the object reference (SingletonSerializer), populates its fields with the values found in the stream (written there by the Singleton class) and only then proceeds to call the GetRealObject method. For this to work the object reference must declare the exact same fields as the original object serialized (number of fields, names and types). Alternatively, the object reference (SingletonSerializer) can implement the ISerializable interface and manually read the values from the stream during deserialization in the special constructor. In this latter case, the object reference must provide the GetObjectData method (otherwise the code wouldn't compile) but this method is never actually called by the framework. You can find example code demonstrating all this in the accompanying files (ObjRef1.dpr and ObjRef2.dpr). Personally I found all this a bit confusing and it took me a while to figure it out. However, once you know how it works it's quite doable. NOTE If you've learned a bit about the .NET remoting framework already you've probably encountered the MarshalByRefObject class. This class is used as the base class for all objects that must be marshaled by-reference across application domain boundaries. The framework uses object references in almost the same way as described above to implement this by-reference marshalling (serialization surrogates, as described before, are also involved). With MarshalByRefObjects the object reference doesn't actually return an instance of the original class, but instead a dynamically generated proxy class is created. To the client this proxy looks just like the original class but its methods don't actually do anything but forward to the real object (which may exist on a machine somewhere on the other side of the world). BUG: After some experimentation I noticed that the IObjectReference.GetRealObject method is called twice with identical parameters. The odd thing is that if you put the object to serialize in an array first and serialize the array, the method is only called once. In other words, the following snippet does work as expected: type TSingletonArray = array [0..0] of Singleton; var O1, O2: TSingeletonArray; begin O1[0] := Singleton.GetInstance; O2 := SingletonArray(CloneObject(O1)); end; Conclusion In this article we have examined almost every aspect of the .NET serialization framework. As you have seen, the .NET framework makes it extremely easy to serialize objects in the most common cases, while allowing you to take complete control over the process when required. You can make your classes serializable simply by attaching the Serializable attribute to the class declaration. If necessary you can either partially or completely take control over the process through the ISerializable interface and streaming contexts. You can choose the format in which objects are serialized and, through the use of streams, choose where the serialized objects are stored. If you need to alter the way objects over which you have no control are serialized, you can use object references, serialization surrogates and binders. In short, the serialization framework has every situation covered. I'm usually hesitant to say it, but Microsoft did a good job on this one! The one thing we didn't examine in this article is implementing your own formatter. Yes, that to is possible. Unfortunately, even though it's not that complex, implementing a formatter requires in depth knowledge of the serialization framework and classes such as FormatterServices and ObjectManager. If you want to know more about these things I recommend the book Microsoft .NET Remoting [2]. It contains a fairly good explanation including an example implementation. Another demo formatter can be found on the GotDotNet website (www.gotdotnet.com search for HtmlFormatter in the user samples section).