Serialization in Qt - part 2

Submitted by mimec on 2012-07-09

In the previous post I wrote about support for different file formats in Qt and the pros and cons of using QDataStreama and a binary format. As I promised, today I will provide some more code. I will also start discussing various issues related to backward and forward compatibility of data files.

For now I will focus on simple cases like storing application settings or simple data like a list of bookmarks. I'm assuming that all serialized data have value-type semantics; i.e. they are stored and copied by value, not by pointer. A lot of classes in Qt are value types, including strings and all kinds of containers (as long as they store value types, not pointers). Also Qt makes it easy to create complex and efficient value types by using QSharedData and the copy on write mechanism, but that's an entirely different story.

In our example I will use the following simple Bookmark class:

class Bookmark
{
public:
    Bookmark();
    Bookmark( const Bookmark& other );
    ~Bookmark();

    const QString& name() const { return m_name; }
    // ... other getters/setters

    Bookmark& operator =( const Bookmark& other );

    friend QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark );
    friend QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark );

private:
    QString m_name;
    QUrl m_url;
};

There is a default constructor, copy constructor and assignment operator; all that's needed for a value type. Thanks to this, we can store our objects in a container like QList. The two overloaded shift operators provide support for serialization. There's even no need to use Q_DECLARE_METATYPE, unless we need to put the bookmark in a QVariant or use it with asynchronous signal/slot connections.

The implementation of the shift operators is straightforward:

QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark )
{
    return stream << bookmark.m_name << bookmark.m_url;
}

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    return stream >> bookmark.m_name >> bookmark.m_url;
}

This is serialization in its true sense: the bytes of the name string are directly followed by the bytes of the URL in the data stream. Without knowing the exact sequence of data, it's not possible to determine if the next byte is part of a string or an integer, and whether the string is part of the bookmark or some other structure. This means that extra care must be taken to read the data in exactly the same order as it was written.

While this is certainly efficient and makes the code extremely simple, there is a big problem when something needs to be changed or added. Let's suppose that a newer version of our application needs to support hierarchical bookmarks. We add a QList<Bookmark> m_children member to the Bookmark class and modify the implementation of the operators:

QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark )
{
    return stream << bookmark.m_name << bookmark.m_url << bookmark.m_children;
}

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    return stream >> bookmark.m_name >> bookmark.m_url >> bookmark.m_children;
}

The QList automatically takes care of serializing all the items it contains by recursively calling the shift operator, so it might appear that nothing else needs to be done. But what happens if the user upgrades the application from the older version, and the new version tries to read the bookmarks file created by that previous version? It will expect the list of child bookmarks just after the URL, but actually it's some entirely different, random data. Attempting to interpret it as something it's not will give unexpected results and might even result in a crash.

The solution is to include a version tag in the stream, so that we can conditionally skip some fields when reading a file created by an older version of our application. For example:

QDataStream& operator <<( QDataStream& stream, const Bookmark& bookmark )
{
    return stream << (quint8)2 << bookmark.m_name << bookmark.m_url << bookmark.m_children;
}

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    quint8 version;
    stream >> version >> bookmark.m_name >> bookmark.m_url;
    if ( version >= 2 )
        stream >> bookmark.m_children;
    return stream;
}

Note, however, that this would only work if the first version of the application also included the version tag in the stream! Otherwise we would attempt to read the first byte of the string as the version, again leading to unexpected results and potential crash. The lesson from this excercise is to think about this up front and always plan for the change.

Using a version tag is a good and universal solution, but it only provides backward compatibility: we can use it to correctly read files created by an older version. What happens if our application attempts to read a file created by a newer version of itself? We cannot predict what changes will be made in the future, so there's not much we can do to handle such situation. We can just close the stream and perhaps throw an exception to prevent the application from crashing.

Forward compatibility usually doesn't matter when it comes to simple configuration files. But what if the file is actually an important document that we need to send to someone else, who might have a slightly older version of the application? One solution would be to use a different format, like XML, but forward compatibility can also be achieved when using the QDataStream. I will write more about it in the next post.

Usually it doesn't make sense to include the version tag with each object, but just once at the beginning of the file. It's also a good idea to write a random "magic" value in the file header, to ensure that the file is really what we think it is. I use a class similar to the following one in my own applications to handle all this automatically:

class DataSerializer
{
public:
    DataSerializer( const QString& path ) :
        m_file( path )
    {
    }

    ~DataSerializer()
    {
    }

    bool openForReading()
    {
        if ( !m_file.open( QIODevice::ReadOnly ) )
            return false;

        m_stream.setDevice( &m_file );
        m_stream.setVersion( QDataStream::Qt_4_6 );

        qint32 header;
        m_stream >> header;

        if ( header != MagicHeader )
            return false;

        qint32 version;
        m_stream >> version;

        if ( version < MinimumVersion || version > CurrentVersion )
            return false;

        m_dataVersion = version;

        return true;
    }

    bool openForWriting()
    {
        if ( !m_file.open( QIODevice::WriteOnly | QIODevice::Truncate ) )
            return false;

        m_stream.setDevice( &m_file );
        m_stream.setVersion( QDataStream::Qt_4_6 );

        m_stream << (qint32)MagicHeader;
        m_stream << (qint32)CurrentVersion;

        m_dataVersion = CurrentVersion;

        return true;
    }

    QDataStream& stream() { return m_stream; }

    static int dataVersion() { return m_dataVersion; }

private:
    QFile m_file;
    QDataStream m_stream;

    static int m_dataVersion;

    static const int MagicHeader = 0xF517DA8D;

    static const int CurrentVersion = 1;
    static const int MinimumVersion = 1;
};

It's basically a wrapper over a file with an associated data stream. It ensures that both the magic header and the version are correct when opening the file for reading, and writes those values when opening it for writing. The CurrentVersion constant should be incremented every time something is added or changed in the serialization code of any class. The MinimumVersion constant allows us to skip support for some really old versions, especially when data format changed too much. The dataVersion static method makes it easy to check the actual version when reading data from the stream:

QDataStream& operator >>( QDataStream& stream, Bookmark& bookmark )
{
    stream >> bookmark.m_name >> bookmark.m_url;
    if ( DataSerializer::dataVersion() >= 2 )
        stream >> bookmark.m_children;
    return stream;
}

Note that the version is stored in a global variable, so this code is not thread safe or re-entrant, but so far it's been enough for me in all situations. More elaborate solutions may be created if necessary.

You can also notice that the DataSerializer explicitly sets the version of the data format to Qt_4_6. This is important, because Qt also has a similar versioning mechanism for it's own serialization format. This may cause problems when data is read and written using different versions of Qt libraries. Here we just enforce compatibility with the minimum supported version of Qt, in this case 4.6. Alternatively, we could store the Qt version in the header along with our internal version and verify both when opening the file.

In the next post I will write more about the forward compatibility problem and about serializing complex hierarchies of objects with cross references.