Careful with QString::fromRawData!

In one of my posts, Qt, SQLite and locale aware collation, I showed that raw strings in UTF-16 can be efficiently wrapped into a QString by using its fromRawData static method. According to the Qt documentation, "the caller must guarantee that this raw string will not be deleted or modified as long as the QString (or an unmodified copy of it) exists". The statement in parentheses is the tricky part. Let's consider the following example:

void callback( const wchar_t* data, int len )
{
    QString string = QString::fromRawData( reinterpret_cast<const QChar*>( data ), len );

    doSomething( string );
}

The data passed to the callback function is guaranteed to be valid only as long as the function is being executed. The QString object is created on the stack, so it is destroyed before the callback function exists. So far, the condition for using fromRawData is fulfilled. The string is also passed another function called doSomething.

Everything is fine if doSomething is a simple, stateless function with no side effects. An example of such function is QString::localeAwareCompare, which I used in the mentioned post. It simply compares two strings character by character to determine their lexical order.

But let's see what happens if doSomething was declared like this:

void doSomething( const QString& string )
{
    if ( string == lastString )
        return;

    // ...

    lastString = string;
}

Notice that this function has a state - the lastString variable, which can be global, static, or member of some object. The lifetime of this state variable exceeds the scope of the function, and also the callback function. Also, it happens to be an unmodified copy of the string which was created using QString::localeAwareCompare! The next time the callback is called, lastString may point to random garbage, because the previous data may be no longer valid.

This type of error is very difficult to notice, neither by code analysis nor by symptoms. It is not always evident that doSomething makes a copy of the string. In case of a setter function like setName it is quite obvious. But when we pass the string to the constructor of a QRegExp, we would expect that the string should only remain valid as long as the regular expression object exists. Wrong! The Qt regular expressions engine caches compiled state machines, so if we create another QRegExp and pass the same expression to it, it doesn't have to be parsed again. Of course this is more complex than the example shown above, but it surely involves creating an unmodified copy of the string that exceeds the lifetime of our function.

The term "unmodified copy" is not very precise. Actually it means a "shallow copy", which is usually created by Qt when assigning one string to another to avoid allocating memory and copying all data (which is called a "deep copy" and is safe, because the original data is no longer referenced).

But in some cases it's not even clear whether we're creating a deep copy or a shallow copy. Let's consider another example:

QString a = "foo";
QString b;
b.append( a );

We would expect that append always creates a deep copy of data, because it modifies the target string. But it turned out (and I learned this the hard way) that if some string is appended to a null string, only a shallow copy of it is created, as an optimization. If we use an empty string instead:

QString a = "foo";
QString b = "";
b.append( a );

then "b" becomes a deep copy of "a", as expected.

Such errors are difficult to track also because they rarely cause a crash. In case of a SQLite callback, the passed data will usually remain valid, if the database is not very big, and we will not notice any error. In other situations, the data may be overwritten and we will notice a strange behavior, but it may be difficult to debug.

Of course it doesn't mean that we shouldn't use fromRawData - it is very useful, especially in time intensive calculations, but it has to be used with extra care. In case of doubt, it's safer to use fromUtf16 instead.

Filed under: Blog