Monday, 13 December 2010

Python thread safe cache class

Every so often the need arises to create a thread safe cache solution. This is my stab at a simple yet fully functional implementation that maintains the essential dictionary semantics, is thread safe and has a fixed, configurable size, for example in a multithreaded http server like CherryPy.

Although many dictionary operation like getting an item are reported to be atomic and therefore thread safe, this is actually an implementation specific feature of the widely used CPython implementation. And even so, adding keys or iterating over the keys in the dictionary might not be thread safe at all. We must therefore use some sort of locking mechanism to ensure no two threads try to modify the cache at the same time. (For more information check this discussion.)

The Cache class shown here features a configurable size and if the number of entries is too big it removes the oldest entry. We do not have to maintain a explicit usage administration for that because we make use of the properties of the OrderedDict class which remembers the order in which keys are inserted and sports a popitem() method that will remove the first (or last) item inserted.

from collections import OrderedDict
from threading import Lock

class Cache:
    def __init__(self,size=100):
        if int(size)<1 :
            raise AttributeError('size < 1 or not a number')
        self.size = size
        self.dict = OrderedDict()
        self.lock = Lock()

    def __getitem__(self,key):
        with self.lock:
            return self.dict[key]

    def __setitem__(self,key,value):
        with self.lock:
            while len(self.dict) >= self.size:
                self.dict.popitem(last=False)
            self.dict[key]=value

    def __delitem__(self,key):
        with self.lock:
            del self.dict[key]

Due to the functionality of the OrderedDict class we use, the implementation is very concise. The __init__() method merely checks whether the size attribute makes any sense and creates an instance of an OrderedDict and a Lock.

The with statements used in the remaining methods wait for the acquisition of the lock and guarantee that the lock is released even if an exception is raised. The __getitem__() method merely tries to retrieve a value by trying the key on the ordered dictionary after acquiring a lock.

The __setitem__() method removes as many items within its while loop to reduce the size to below the preset amount and then adds the new value. The popitem() method of an OrderedDict removes the least recently added key/value pair if it's last argument is set to False.

The __delitem__() also merely passes on the control to the underlying dictionary. Together these methods allow for any instance of our Cache class to be used like any other dictionary as the example code below illustrates:

>>> from cache import Cache
>>> c=Cache(size=3)
>>> c['key1']="one"
0
>>> c['key2']="two"
1
>>> c['key3']="three"
2
>>> c['key4']="four"
3
>>> c['key4']
'four'
>>> c['key1']
Traceback (most recent call last):
  File "", line 1, in 
  File "cache.py", line 13, in __getitem__
    return self.dict[key]
KeyError: 'key1'

Of course this doesn't show off the thread safety but it does show that the semantics are pretty much like that of a regular dictionary. If needed this class even be extended with suitable iterators/view like keys() and items() but for most caches this probably isn't necessary.

No comments:

Post a Comment