Serialization mechanisms

A description of the mrpt::utils::CSerializable class and how to implement serializable classes.


1 The basics

Serializing consists of taking an existing object and converting it into a sequence of bytes, in any given format, such as the contents and state of the object can be afterward reconstructed, or deserialized [1]. There are many C++ libraries for serializing out there (e.g. boost), although the MRPT C++ library uses a simple, custom implementation with the following aims:

  • Simplicity: A few and small core functions only.
  • Versioning: If a class changes along time (something really common), a new version number will be assigned to its serialization, but old stored data can be still imported.
  • C++ compiler independence: Use only standardized data-lengths. For example, a data of type “int” has different lengths depending on the machine, thus it is not allowed to serialize an “int” variable without forcing it to a known length.

Currently, the only supported format for serialization is binary, i.e. there is no support for XML. The reason is that, for robotic applications, it is typically more important to save data size (and transmission times) between a running, real-time system. Note that special “stream” classes exist in MRPT, so the standardstd::istream and std::ostream are left for textual input and output (mostly just for human inspection or debugging), while MRPT’s own stream classes are (almost) uniquely intended for binary serialization.

1.1. Classes

The actual binary frame for each serialized object is sketched below:


Note: In versions before MRPT 0.5.5 the end flag was not present and the first and third fields were 4 bytes wide (instead of just 1). However, data saved in the old format can be still loaded without problems. When an object is serialized, its contents are written to a generic destination via a CStream class. The list of currently implemented streams can be seen in mrpt::utils::CStream.

1.2. POD (Plain old datatypes) and special cases

Within the “object data” field mentioned above, each class has full control on what to store there. Typically, a class dumps here each of the internal objects of other different classes, so the serialization format is sort of recursive. However, some basic and common types that we know will not change over time are managed specially to avoid the extra cost of the headers and start-end flags. The serialization of the following types:

  • bool
  • uint8_tint8_t
  • uint16_tint16_t
  • uint32_tint32_t
  • uint64_tint64_t
  • float
  • double
  • long double (if defined in the used compiler)

directly consists of a dump of the block of memory the variable occupies, using little endianness (even in big-endian architectures). For float and double types, the format assumes a low-level IEEE 754 machine codification (virtually all modern architectures). Notice how int or short are not listed above. This is due to the architecture-dependent sizes of those types. Please, always use types with well-defined sizes when dealing with serialization. The following basic types also have a special serialization format in MRPT:

  • const char *: Strings. The binary format consists of a uint32_t value with the length of the string (without trailing ‘\0’), next the string characters, without the trailing ‘\0’.
  • std::string: Strings. Exactly as for the “const char*” case.
  • Vectors of elemental types: These vectors are serialized as a uint32_t value with the number of elements, next the serialization of each element (Note:These formats are specialized versions, for storage efficiency, of the more generic STL serialization mechanism described below):
    • std::vector<float>
    • std::vector<double>
    • std::vector<int8_t>
    • std::vector<int16_t>
    • std::vector<int32_t>
    • std::vector<int64_t>
    • std::vector<uint8_t>
    • std::vector<uint16_t>
    • std::vector<uint32_t>

1.3. Storing arrays of elemental types.

Say you want to save and load a plain C array with elemental data types (POD, read above). It’s important to pay attention to the endianness of those POD types. For example, writing the entire memory block of the array like in:

would result in a binary format not compatible across systems of different endianness. Instead of the code above, MRPT provides two methods that take care of reordering the bytes, if necessary:

For further information, refer to the documentation of mrpt::utils::CStream and its methods. Also, notice that if your vectors are in STL containers instead of plain C arrays, you can use the STL serialization mechanism described below, which will be always safer and clearer.

1.4. Basic usage

The typical usage of serialization for storing an existing object into, for example, a file, is to use the << operator of the CStream class:

To restore a saved object, you can use two methods, depending of whether you are sure about the class of the object which will be read from the stream, or not. If you know the class of the object to be read, you can simply use the >> operator towards an existing object, which will be passed by reference and its contents overwritten with those read from the stream. An example:

The other situation if when you don’t know the class of the object which will be read. In this case it must be declared a smart pointer to a generic utils::CSerializable object (initialized as NULL to indicate that it is empty), and after using the >> operator it will point to a newly created object with the deserialized object:

The next section explains the most important methods of utils::CSerializable and runtime class information. In the case of loading objects of unknown class, it is important to read the MRPT registration mechanism and when you should call it manually. Note that these code examples do not catch potential exceptions (more about exception management in the MRPT here). Apart from using the operators << and >> over a utils::CStream, there are two independent functions, utils::ObjectToString and utils::StringToObject, which serialize and deserialize, respectively, an object into a standard STL string (std::string). The difference of these functions with serialization over normal CStream’s is that the binary data stream is encoded to avoid null characters (‘\0’), such as the resulting string can be passed as a char *. Avoid using these functions but when strictly necessary, since they introduce an additional processing delay.

2 Run-time class identification

All serializable classes must inherit from the virtual class utils::CSerializable, which provides standard methods to manage any serializable object without knowing its real class. The most common operation is probably to check whether an object is of a given type, which can be performed by:

If the class to test is not in the current namespace (and there is not a using namespace NAMESPACE;), you can alternatively use CLASS_ID_NAMESPACE, for example:

The method CSerializable::GetRuntimeClass() actually returns a pointer to a UTILS::TRuntimeClassId data struct, which contains other useful members:

    • The class name as a string:
  • Checking whether a class is a descendent of a given virtual class. An example:
Other useful method of any serializable object is CSerializable::duplicate, which makes a copy of the object. The internal data, pointers, etc… will be really duplicated and the original object can be safely deleted.

3 How to implement new serializable classes

Next it is described the internals of CSerializable classes and how to develop new serializable classes.

3.1 General procedure

  • Define a default constructor for the class, i.e. with no parameters. You can also assign default values to all the parameters of at least one of its constructors.
  • Derive the class from utils::CSerializable, or any other class which is derived from it.
  • Add the macro DEFINE_SERIALIZABLE(class_name) to the class definition (inside the “class” scope), and DEFINE_SERIALIZABLE_PRE(class_name) before the class declaration.
  • Add the macro IMPLEMENTS_SERIALIZABLE(class_name,parent_class,namespace) to the class implementation file.
  • Implement the virtual methods UTILS::CSerializable::writeToStream() and UTILS::CSerializable::readFromStream() in your class. These methods are in charge of dumping/parsing the object to/from binary streams.
    • virtual void writeToStream(CStream &out, int *getVersion) const = 0;
      • out: The output binary stream where data must be dumped.
      • getVersion: If NULL, the object data must be dumped. Otherwise, only the version of the object dump must be returned in this pointer. This enables the versioning of objects dumping and backward compatibility with previously stored data.
    • virtual void readFromStream(CStream &in, int version) = 0;
      • in: The output binary stream of the object to be read: typically a “switch” over versions implements the different reading procedures for all the streaming versions, with the aim of allowing binary compatibility with old data saved with different versions.
      • version: The version of the object stored in the stream: use this version number in your code to know how to read the incoming data.

The following example can be used as a template for creating new serializable classes:

3.2 Special situations

If the serializable class is virtual, the macros DEFINE_VIRTUAL_SERIALIZABLE() and IMPLEMENTS_VIRTUAL_SERIALIZABLE() must be used instead (DEFINE_SERIALIZABLE_PRE is used without changes).


4 What is serialization used for in the MRPT?

  • Rawlogs: Data gathered by a robot (“datasets”) are saved in the rawlog format (*.rawlog), actually a sequence of action-observation pairs serialized to a file. These files have an standalone application to manage, visualizing, and editing them, the RawLogViewer.
  • Maps: All the maps defined in the MRPT can be saved to files through serialization (see mrpt::slam::CMetricMap).
  • 3D scenes are also saved in a custom file format (*.3Dscene), which is just the serialization of a UTILS::COpenGLScene object. There is a standalone application for visualizing these files, the 3DSceneViewer.
  • Transmission of objects (maps, images, sensory data, etc…) through a TCP/IP socket.


5 The MRPT internal registry of serializable classes

To load an object of unknown class from a stream, its class must be previously registered as a CSerializable implementation (see mrpt::utils::registerClass). Sometimes it is interesting to get a list of all existing classes, for example, to build a list of classes that descent from a given virtual base class. For this purpose, use mrpt::utils::getAllRegisteredClasses.


6 Serialization and STL containers

MRPT fully supports serializing arbitrarily complex data structures mixing STL containers, plain data types and MRPT classes. For example:

The code above will compile and work without the need of the user to write any extra code for the multimap<> type. In the case of STL containers, the binary format consists on:

  • The dump of a std::string with the STL container name (dumped using the serialization format explained above).
  • The dump of the strings representing each of the types kept by the container (the key and value for a map, the values for a list, etc…).
  • The number of elements in the container (for all containers but std::pair).
  • The recursive dump of each of the elements. Here the same may apply if the elements are STL containers. For normal MRPT classes, the format explained above is used here.

The following real example illustrates this format:

And this is the generated output: