More SQLite multithreading woes

In this article we revisit our thread safe persistent dictionary and encounter some irritating performance issues, both SQLite related and caused by Python itself.

SQLite.OperationalError, database is locked

When testing the persistentdict module with many simultaneous threads (more than twenty) I noticed a great number of errors: The many connections open to the same database caused SQLite to raise a lot of SQLite.OperationalError, database is locked exceptions. Getting decent performance with SQLite is by no means easy and because SQLite only locks complete database files and not just tables or rows there is a big chance that threads accessing the same database have to wait to get their turn.

sqlite3.connect, the check_same_thread parameter

Python 3.2 comes with a sqlite3 module that implements (but scarcely documents) a check_same_thread parameter that can be set to false to allow threads to use the same Connection object simultaneously. This is nice since this means we no longer have to implement all sorts of code to provide each thread with its own connection.

But we still have to regulate access to this connection because otherwise a commit in one thread may invalidate a longer running execute in another thread, leaving us with errors like sqlite3.InterfaceError: Cursor needed to be reset because of commit/rollback and can no longer be fetched from.

Python thread switching is really slow

Switching between threads, especially on multi core machines, has never been Python's strongest feature and making our persistent dict thread safe with a lock might hurt performance a lot, depending on what is going on in the threads itself (if there is a lot of I/O going on the impact might not be that big).

A new implementation

The code below shows the new implementation, with a single connection that may be shared by multiple threads. It works but it is really slow: with 40 threads I get just 10 to 20 dictionary assignments (d[1]=2) per second on my dual core Atom. That isn't the fastest machine around but those figures are ridiculously low. We will have to rethink our approach if we want to use SQLite in a multithreaded environment if we need any kind of performance!

  1. """ 
  2.  persistentdict module $Revision: 98 $ $Date: 2011-07-23 14:01:04 +0200 (za, 23 jul 2011) $ 
  3.  
  4.  (c) 2011 Michel J. Anders 
  5.  
  6.  This program is free software: you can redistribute it and/or modify 
  7.     it under the terms of the GNU General Public License as published by 
  8.     the Free Software Foundation, either version 3 of the License, or 
  9.     (at your option) any later version. 
  10.  
  11.     This program is distributed in the hope that it will be useful, 
  12.     but WITHOUT ANY WARRANTY; without even the implied warranty of 
  13.     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
  14.     GNU General Public License for more details. 
  15.  
  16.     You should have received a copy of the GNU General Public License 
  17.     along with this program.  If not, see <http: www.gnu.org="" licenses="">. 
  18.  
  19. """  
  20.   
  21. from collections import UserDict  
  22. import sqlite3 as sqlite  
  23. from pickle import dumps,loads  
  24. import threading   
  25.   
  26. class PersistentDict(UserDict):  
  27.   
  28.  """ 
  29.  PersistentDict  a MutableMapping that provides a thread safe, 
  30.      SQLite backed persistent storage. 
  31.   
  32.  db  name of the SQLite database file, e.g. 
  33.    '/tmp/persists.db', default to 'persistentdict.db' 
  34.  table name of the table that holds the persistent data, 
  35.    usefull if more than one persistent dictionary is 
  36.    needed. defaults to 'dict' 
  37.   
  38.  PersistentDict tries to mimic the behaviour of the built-in 
  39.  dict as closely as possible. This means that keys should be hashable. 
  40.   
  41.  Usage example: 
  42.   
  43.  >>> from persistentdict import PersistentDict 
  44.  >>> a=PersistentDict() 
  45.  >>> a['number four'] = 4 
  46.   
  47.  ... shutdown and then restart applicaion ... 
  48.   
  49.  >>> from persistentdict import PersistentDict 
  50.  >>> a=PersistentDict() 
  51.  >>> print(a['number four']) 
  52.  4 
  53.   
  54.  Tested with Python 3.2 but should work with 3.x and 2.7.x as well. 
  55.   
  56.  run module directly to run test suite: 
  57.   
  58.  > python PersistentDict.py 
  59.   
  60.  """  
  61.    
  62.  def __init__(self, dict=None, **kwargs):  
  63.     
  64.   self.db    = kwargs.pop('db','persistentdict.db')  
  65.   self.table = kwargs.pop('table','dict')  
  66.   #self.local = threading.local()  
  67.   self.conn = None  
  68.   self.lock = threading.Lock()  
  69.     
  70.   with self.lock:  
  71.    with self.connect() as conn:  
  72.     conn.execute('create table if not exists %s (hash unique not null,key,value);'%self.table)  
  73.       
  74.   if dict is not None:  
  75.    self.update(dict)  
  76.   if len(kwargs):  
  77.    self.update(kwargs)  
  78.    
  79.  def connect(self):  
  80.   if self.conn is None:  
  81.    self.conn = sqlite.connect(self.db,check_same_thread=False)  
  82.   return self.conn  
  83.      
  84.  def __len__(self):  
  85.   with self.lock:  
  86.    cursor = self.connect().cursor()  
  87.    cursor.execute('select count(*) from %s'%self.table)  
  88.    return cursor.fetchone()[0]  
  89.    
  90.  def __getitem__(self, key):  
  91.   with self.lock:  
  92.    cursor = self.connect().cursor()  
  93.    h=hash(key)  
  94.    cursor.execute('select value from %s where hash = ?'%self.table,(h,))  
  95.    try:  
  96.     return loads(cursor.fetchone()[0])  
  97.    except TypeError:  
  98.     if hasattr(self.__class__, "__missing__"):  
  99.      return self.__class__.__missing__(self, key)  
  100.    raise KeyError(key)  
  101.      
  102.  def __setitem__(self, key, item):  
  103.   h=hash(key)  
  104.   with self.lock:  
  105.    with self.connect() as conn:  
  106.     conn.execute('insert or replace into %s values(?,?,?)'%self.table,(h,dumps(key),dumps(item)))  
  107.   
  108.  def __delitem__(self, key):  
  109.   h=hash(key)  
  110.   with self.lock:  
  111.    with self.connect() as conn:  
  112.     conn.execute('delete from %s where hash = ?'%self.table,(h,))  
  113.   
  114.  def __iter__(self):  
  115.   with self.lock:  
  116.    cursor = self.connect().cursor()  
  117.    cursor.execute('select key from %s'%self.table)  
  118.    rows = list(cursor.fetchall())  
  119.   for row in rows:  
  120.    yield loads(row[0])  
  121.   
  122.  def __contains__(self, key):  
  123.   h=hash(key)  
  124.   with self.lock:  
  125.    cursor = self.connect().cursor()  
  126.    cursor.execute('select value from %s where hash = ?'%self.table,(h,))  
  127.    return not ( None is cursor.fetchone())  
  128.   
  129.  # not implemented def __repr__(self): return repr(self.data)  
  130.    
  131.  def copy(self):  
  132.   c = self.__class__(db=self.db)  
  133.   for key,item in self.items():  
  134.    c[key]=item  
  135.   return c  
  136. </http:>