In this article we revisit our thread safe persistent dictionary and encounter some irritating performance issues, both SQLite related and caused by Python itself.
SQLite.OperationalError, database is locked
When testing the persistentdict
module with many simultaneous threads (more than twenty) I noticed a great number of errors: The many connections open to the same database caused SQLite to raise a lot of SQLite.OperationalError, database is locked
exceptions. Getting decent performance with SQLite is by no means easy and because SQLite only locks complete database files and not just tables or rows there is a big chance that threads accessing the same database have to wait to get their turn.
sqlite3.connect, the check_same_thread parameter
Python 3.2 comes with a sqlite3
module that implements (but scarcely documents) a check_same_thread
parameter that can be set to false to allow threads to use the same Connection
object simultaneously. This is nice since this means we no longer have to implement all sorts of code to provide each thread with its own connection.
But we still have to regulate access to this connection because otherwise a commit in one thread may invalidate a longer running execute in another thread, leaving us with errors like sqlite3.InterfaceError: Cursor needed to be reset because of commit/rollback and can no longer be fetched from.
Python thread switching is really slow
Switching between threads, especially on multi core machines, has never been Python's strongest feature and making our persistent dict thread safe with a lock might hurt performance a lot, depending on what is going on in the threads itself (if there is a lot of I/O going on the impact might not be that big).
A new implementation
The code below shows the new implementation, with a single connection that may be shared by multiple threads. It works but it is really slow: with 40 threads I get just 10 to 20 dictionary assignments (d[1]=2
) per second on my dual core Atom. That isn't the fastest machine around but those figures are ridiculously low. We will have to rethink our approach if we want to use SQLite in a multithreaded environment if we need any kind of performance!
- """
- persistentdict module $Revision: 98 $ $Date: 2011-07-23 14:01:04 +0200 (za, 23 jul 2011) $
- (c) 2011 Michel J. Anders
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http: www.gnu.org="" licenses="">.
- """
- from collections import UserDict
- import sqlite3 as sqlite
- from pickle import dumps,loads
- import threading
- class PersistentDict(UserDict):
- """
- PersistentDict a MutableMapping that provides a thread safe,
- SQLite backed persistent storage.
- db name of the SQLite database file, e.g.
- '/tmp/persists.db', default to 'persistentdict.db'
- table name of the table that holds the persistent data,
- usefull if more than one persistent dictionary is
- needed. defaults to 'dict'
- PersistentDict tries to mimic the behaviour of the built-in
- dict as closely as possible. This means that keys should be hashable.
- Usage example:
- >>> from persistentdict import PersistentDict
- >>> a=PersistentDict()
- >>> a['number four'] = 4
- ... shutdown and then restart applicaion ...
- >>> from persistentdict import PersistentDict
- >>> a=PersistentDict()
- >>> print(a['number four'])
- 4
- Tested with Python 3.2 but should work with 3.x and 2.7.x as well.
- run module directly to run test suite:
- > python PersistentDict.py
- """
- def __init__(self, dict=None, **kwargs):
- self.db = kwargs.pop('db','persistentdict.db')
- self.table = kwargs.pop('table','dict')
- #self.local = threading.local()
- self.conn = None
- self.lock = threading.Lock()
- with self.lock:
- with self.connect() as conn:
- conn.execute('create table if not exists %s (hash unique not null,key,value);'%self.table)
- if dict is not None:
- self.update(dict)
- if len(kwargs):
- self.update(kwargs)
- def connect(self):
- if self.conn is None:
- self.conn = sqlite.connect(self.db,check_same_thread=False)
- return self.conn
- def __len__(self):
- with self.lock:
- cursor = self.connect().cursor()
- cursor.execute('select count(*) from %s'%self.table)
- return cursor.fetchone()[0]
- def __getitem__(self, key):
- with self.lock:
- cursor = self.connect().cursor()
- h=hash(key)
- cursor.execute('select value from %s where hash = ?'%self.table,(h,))
- try:
- return loads(cursor.fetchone()[0])
- except TypeError:
- if hasattr(self.__class__, "__missing__"):
- return self.__class__.__missing__(self, key)
- raise KeyError(key)
- def __setitem__(self, key, item):
- h=hash(key)
- with self.lock:
- with self.connect() as conn:
- conn.execute('insert or replace into %s values(?,?,?)'%self.table,(h,dumps(key),dumps(item)))
- def __delitem__(self, key):
- h=hash(key)
- with self.lock:
- with self.connect() as conn:
- conn.execute('delete from %s where hash = ?'%self.table,(h,))
- def __iter__(self):
- with self.lock:
- cursor = self.connect().cursor()
- cursor.execute('select key from %s'%self.table)
- rows = list(cursor.fetchall())
- for row in rows:
- yield loads(row[0])
- def __contains__(self, key):
- h=hash(key)
- with self.lock:
- cursor = self.connect().cursor()
- cursor.execute('select value from %s where hash = ?'%self.table,(h,))
- return not ( None is cursor.fetchone())
- # not implemented def __repr__(self): return repr(self.data)
- def copy(self):
- c = self.__class__(db=self.db)
- for key,item in self.items():
- c[key]=item
- return c
- </http:>