SQLite

Locking an SQLite database

Submitted by mimec on 2011-10-07

Today I will return to the topic of SQLite once again. This time I will discuss why and how to lock an SQLite database for exclusive access to make sure that it's not modified by other processes.

Of course one of the goals of SQLite, like most database engines, is concurrency - i.e. the ability for multiple processes to read and write data to the same database. As we can read in the documentation, "in order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held".

That makes perfect sense if we use SQLite with a web application, which should handle as many concurrent requests as possible. But there are some scenarios when we might want to keep the database locked for exclusive access. For example, an SQLite database is often used as a cache for various data. If we run two instances of a program and both use the same cache, it may sometimes lead to strange results. Sometimes it's just easier to prevent such situation than to try to handle it. As you can guess, that's the situation I came across in the WebIssues client.

One way to achieve this is to simply use one large transaction throughout the lifetime of the connection. The documentation advises such solution when the database is used as the application file format. However, sometimes we want all modifications to be committed as soon as possible, for example to prevent losing data on a crash, while still keeping an exclusive lock as long as the connection is open.

Fortunately, SQLite has a useful pragma statement called locking_mode. If we set it to EXCLUSIVE, the obtained locks are never released until the connection is closed. However, the name of this setting is slightly misleading, because it doesn't really obtain an exclusive lock, it just ensures that the lock is never released. So perhaps "persistent" locking mode would be a better name.

In order to lock the database, we need to perform the following sequence of commands:

PRAGMA locking_mode = EXCLUSIVE
BEGIN EXCLUSIVE
COMMIT

The BEGIN EXCLUSIVE command actually locks the database. It begins an empty transaction, which doesn't change anything, but ensures that the exclusive lock is acquired on the database. If the database is already locked by another process, this command will fail with the SQLITE_BUSY error.

Note that the standard Qt driver for SQLite sets the busy timeout to 5 seconds by default, so the application will freeze for a while before reporting an error. We can change it using the QSQLITE_BUSY_TIMEOUT connection option, or when using a custom driver, by simply changing the parameter of sqlite3_busy_timeout to a smaller value.

Careful with QString::fromRawData!

Submitted by mimec on 2011-07-12

In one of my posts, Qt, SQLite and locale aware collation, I showed that raw strings in UTF-16 can be efficiently wrapped into a QString by using its fromRawData static method. According to the Qt documentation, "the caller must guarantee that this raw string will not be deleted or modified as long as the QString (or an unmodified copy of it) exists". The statement in parentheses is the tricky part. Let's consider the following example:

void callback( const wchar_t* data, int len )
{
    QString string = QString::fromRawData( reinterpret_cast<const QChar*>( data ), len );

    doSomething( string );
}

The data passed to the callback function is guaranteed to be valid only as long as the function is being executed. The QString object is created on the stack, so it is destroyed before the callback function exists. So far, the condition for using fromRawData is fulfilled. The string is also passed another function called doSomething.

Everything is fine if doSomething is a simple, stateless function with no side effects. An example of such function is QString::localeAwareCompare, which I used in the mentioned post. It simply compares two strings character by character to determine their lexical order.

But let's see what happens if doSomething was declared like this:

void doSomething( const QString& string )
{
    if ( string == lastString )
        return;

    // ...

    lastString = string;
}

Notice that this function has a state - the lastString variable, which can be global, static, or member of some object. The lifetime of this state variable exceeds the scope of the function, and also the callback function. Also, it happens to be an unmodified copy of the string which was created using QString::localeAwareCompare! The next time the callback is called, lastString may point to random garbage, because the previous data may be no longer valid.

This type of error is very difficult to notice, neither by code analysis nor by symptoms. It is not always evident that doSomething makes a copy of the string. In case of a setter function like setName it is quite obvious. But when we pass the string to the constructor of a QRegExp, we would expect that the string should only remain valid as long as the regular expression object exists. Wrong! The Qt regular expressions engine caches compiled state machines, so if we create another QRegExp and pass the same expression to it, it doesn't have to be parsed again. Of course this is more complex than the example shown above, but it surely involves creating an unmodified copy of the string that exceeds the lifetime of our function.

The term "unmodified copy" is not very precise. Actually it means a "shallow copy", which is usually created by Qt when assigning one string to another to avoid allocating memory and copying all data (which is called a "deep copy" and is safe, because the original data is no longer referenced).

But in some cases it's not even clear whether we're creating a deep copy or a shallow copy. Let's consider another example:

QString a = "foo";
QString b;
b.append( a );

We would expect that append always creates a deep copy of data, because it modifies the target string. But it turned out (and I learned this the hard way) that if some string is appended to a null string, only a shallow copy of it is created, as an optimization. If we use an empty string instead:

QString a = "foo";
QString b = "";
b.append( a );

then "b" becomes a deep copy of "a", as expected.

Such errors are difficult to track also because they rarely cause a crash. In case of a SQLite callback, the passed data will usually remain valid, if the database is not very big, and we will not notice any error. In other situations, the data may be overwritten and we will notice a strange behavior, but it may be difficult to debug.

Of course it doesn't mean that we shouldn't use fromRawData - it is very useful, especially in time intensive calculations, but it has to be used with extra care. In case of doubt, it's safer to use fromUtf16 instead.

QSqlQueryModel and QTreeView - part 3

Submitted by mimec on 2011-07-06

In the previous post I started describing the SqlTreeModel which combines multiple QSqlQueryModels so that they can be used in a QTreeView.

We already know how to map items from multiple SQL models in order to create a hierarchy of items, how to map rows and columns from the SqlTreeModel into the SQL models, and how to implement the five pure virtual methods of QAbstractItemModel.

Our tree model already works, but items cannot be sorted by clicking on the column header. There are several ways to do that. One could use the QSqlTableModel which has built-in sorting support, but I find it easier to use the more generic QSqlQueryModel as it is more flexible and allows writing SQL queries by hand. Although it doesn't support sorting directly, the query can be modified so that it includes different ORDER BY clauses to achieve the same effect.

QAbstractItemModel has a virtual method called sort, which is called by the view when the user clicks on a column header. The default implementation does nothing, so we should provide a custom implementation in our derived class, SqlTreeModel. We should determine the order clauses based on the passed sort column index and order, pass new queries to QAbstractItemModels and rebuild the tree.

In fact, since QSqlTableModel is generic and can be reused by multiple different models with different queries, the sort only remembers the sort column and order and calls another virtual method, updateQueries, which is implemented by actual models. Individual models can extend this mechanism, and call this method, for example, when some filtering criteria are changed.

Some columns may be irrelevant for specific levels of items. For example, in WebIssues, the projects tree has two columns, Name and Type, but Type is only relevant for folders, and not for projects or alerts. When sorting by Type, the queries for projects and alerts are not modified, so the previous sorting order is retained at these levels, which produces logical and predictable results.

The last problem is that when the tree is rebuilt (for example, after changing the sort order), the view should be updated, but we don't wan't the current selection or expanded state of nodes to be lost, so we shouldn't simply call reset (which is what the standard SQL models do). Instead we should use the layoutAboutToBeChanged and layoutChanged signals.

The first signal indicates that the view should prepare for changes in the model. The second signal indicates that the changes are completed and the view should update itself. We also have to retrieve persistent indexes just after signalling layoutAboutToBeChanged and update these indexes to reflect changes in the model just before signalling layoutChanged.

How to map old indexes to new indexes? While updating the model, some rows may be added, some may be deleted, and some may be reordered. However, the identifiers (i.e. primary keys) of existing items will not change. So we should first map old indexes to levels and identifiers of corresponding items. After updating the tree, we should map levels and identifiers back to new indexes. To do that, we need a function which will find the index of the item with specified identifier at the specified level, but with the internal structures used to index the tree, it's an easy task. Such function is also useful, for example, if we wan't to select an item with given identifier.

The complete SqlTreeModel class is part of version 1.0 of the WebIssues client and it's available under the terms of the GNU General Public License. You can also use it, along with this series of posts, as a starting point for creating a custom model that goes beyond the standard functionality provided by the Qt framework.

QSqlQueryModel and QTreeView - part 2

Submitted by mimec on 2011-06-27

In the previous post I described how to combine multiple QSqlQueryModels in order to create a hierarchy of items which can be displayed in a QTreeView.

Our class (let's call it SqlTreeModel) will inherit QAbstractItemModel and therefore must implement at least the five pure virtual methods: columnCount, rowCount, index, parent and data. The implementation of rowCount, index and parent is based on the internal data structures which I described in the previous article. Those methods are used to enumerate items and navigate back and forth within the tree. They are not very complex and I will not describe them here in detail; you can peek into the code of the WebIssues client if you are interested.

The data method is also pretty straightforward; it determines the level of the item and delegates the implementation to the model associated with the level. It also maps the row and column from the tree model to the underlying SQL query model. Mapping of rows is based on the internal data structures, which I described previously.

How columns are mapped depends on what is needed in a particular situation. In the simplest case, they could be simply mapped one-to-one to the underlying SQL models; the first column of the tree would correspond to the first column in every SQL query model (after skipping identifiers used to build relations between parent and child items), and so on. So the number of columns in the tree view (as returned by columnCount) would be determined by the largest number of columns in all dependent models.

That's how SqlTreeModel works by default, but it also allows customizing this default column mapping. Let's suppose that we want to display a different icon depending on the state of the item. The state can be retrieved from the database as a separate column in the SQL query model. However we don't want this column to be displayed directly in the view, but instead its values should be used to modify data (in this case, the icon) of another column.

This can be done by inheriting the SqlTreeModel and reimplementing the data virtual method. When retrieving the decoration of the specific column at the specific level, it would ask for the value of the appropriate column in the SQL query model and return the appropriate icon. In other cases, it would call the default implementation. We also need to customize column mapping, so that this extra column, containing the state of the item, is not displayed. The column mapping can be simply a list of indexes in the SQL query model that correspond to subsequent columns in the tree model.

The value of a specific column can also be calculated using some algorithm based on several source columns or an external data source. In that case we can map it to a placeholder (represented by a negative value) instead of a specific column in the underlying model.

Yet another problem related to handling columns is retrieving column headers. We can delegate the headerData virtual method to one of the child SQL query models, but with different number of columns and various mappings, this can quickly become difficult to manage. I took a simpler approach - I simply implemented setHeaderData and headerData so that they store all passed values, just like the QSqlQueryModel does. So after adding all child SQL models and setting up column mappings, we can simply set column headers by calling setHeaderData for each column.

It's getting complicated, isn't it? In the next post I will describe how to handle sorting and how to update the model without losing the state of expanded and selected nodes in the tree view associated with our model.

QSqlQueryModel and QTreeView - part 1

Submitted by mimec on 2011-06-19

Qt provides an easy way of populating a table view using an SQL query by using the QSqlQueryModel. It can also be used with a QTreeView, but only for displaying flat lists. If we want to display a hierarchy of items based on a set of SQL queries, we have to subclass the QAbstractItemModel on our own, which is not an easy task.

There are many ways in which hierarchies of items can be created and it's probably difficult to abstract it and create an universal solution. The first category is a "homogeneous" tree, in which all items represent the same class of entities, and which could be populated using a single query with Id and ParentId columns. Items whose ParentId is NULL represent top-level items, and items with given ParentId are children of the corresponding parent item. An example would be a tree of categories, which can have sub-categories, sub-sub-categories, etc.

Another category is a "heterogeneous" tree, where each level of the hierarchy corresponds to a different class of entities (i.e. a different table). That's the case which I faced in WebIssues. One of the trees contains projects, which can have child folders, and folders in turn can have child alerts. Another tree contains issue types and their corresponding attributes. There can also be various combinations of these two categories, and more complex scenarios, but they usually can be derived in one way or another from the "heterogeneous" solution, so I will focus only on that case.

To create a tree model based on multiple related SQL queries, we need the following components:

  • A subclass of QAbstractItemModel providing an implementation of all its abstract methods
  • A collection of QSqlQueryModels, each of which contains items at a different level of hierarchy
  • Some internal data structures used for tracking dependencies between items

For now I will skip the implementation of the model's abstract methods, as this is a topic for a separate post. Let's assume that we want to represent the following simple tree of items:

+ Project 1
| + Folder 1
| | + Alert 1
| | + Alert 2
| + Folder 2
+ Project 2
  + Folder 3

The first level of the hierarchy will be populated using an SQL query returning identifiers and names of projects, for example:

+-----------+-----------+
| ProjectId | Name      |
+-----------+-----------+
|         1 | Project 1 |
+-----------+-----------+
|         2 | Project 2 |
+-----------+-----------+

The second level of hierarchy will be based on the following query results:

+----------+-----------+----------+
| FolderId | ProjectId | Name     |
+----------+-----------+----------+
|        1 |         1 | Folder 1 |
+----------+-----------+----------+
|        2 |         1 | Folder 2 |
+----------+-----------+----------+
|        3 |         2 | Folder 3 |
+----------+-----------+----------+

As you can see, the first two folders belong to the first project, and the third folder belongs to the second project, just like on the first diagram. A similar query, this time with AlertId, FolderId and Name columns, can be used to populate alerts, i.e. the third level of hierarchy, and this could be continued to create even more levels.

We assumed that the first column is always the primary key, and the second column is the foreign key related to the parent table (except for the first, root level). So these columns constitute the structure of the tree. Subsequent columns contain data which is displayed in the tree view (there can be more columns than just the name, but that's also a topic for another post).

Now let's get to the internal structures which are needed to keep track of items and relations between them. As you know, each item, or rather cell, of the Qt data model, is represented by an index. The index consists of a row index, column index and an optional parent index. In case of a flat list or table, there are no parent/child relations between items, so the row and column indexes are sufficient to describe a cell. In a tree model, the index must also contain a pointer to some internal data structure, which is necessary to determine the parent index and to keep track of the child items.

At the minimum, the internal data structure (let's call it a "node" for simplicity) should store the level of hierarchy, at which it is located, and the index of the parent item in the parent node. Note that the node does not correspond to a single item in the model, but rather to a group of items with the same parent.

For example, in the model shown on the first diagram, there would be four nodes. The first node, or the root node, represents all top level items, i.e. all projects. It has a level of 0 and the parent index is irrelevant since it has no parent. The second node represents two folders which belong to Project 1, with a level of 1 and the parent index of 0 (i.e. the index of Project 1 in the first SQL model). The third node represents the single folder belonging to Project 2 (with a level of 1 and parent index of 1). The fourth node represents the alerts belonging to Folder 1 (with a level of 2 and parent index of 0).

Obviously the model needs to keep track of the node structures, so at each level of the tree there should be some kind of a container for its nodes. Also, to make navigation around the tree easier and more efficient, we need some additional data structures serving as sort of indexes. We need to be able to find both parent and child indexes of a given index, and to map indexes of our tree model to indexes of the associated QSqlQueryModels and vice versa, and it's not as easy as it seems.

In the next post I will describe some of the nuances of implementing the model itself.