Sunday, 3 July 2011

A SQLite multiprocessing proxy

This is the first article in a series on improving the performance of Python web applications by leveraging the possibilities of the multiprocessing module. We'll focus on CherryPy and SQLite but the conclusions should be general enough for any Python based platform

Use case

Due to well known restrictions in the most common Python implementation, multithreading solutions will probably not help to solve performance issues (with the possible exception of serving slow network connections). The multiprocessing module offers an API similar to the threading module and might be an alternative when we want to divide the workload on a multicore machine.

The use case we're interested in is a CherryPy server that serves many requests, backed by a SQLite database. CherryPy is multithreaded by design and this approach is sensible as a web server may spend more time waiting for data to be transmitted over relatively slow network connections than actually doing work.

CherryPy however is also an excellent framework to host web applications and many web applications rely on some sort of database back-end. SQLite is a good choice for such a back-end as it comes bundled with Python (reducing the number of external dependencies), is easy to use and performs well enough. With some tricks it will even play nice in a multithreaded environment.

A disadvantage of using SQLite is that we do not have a separate database server: the SQLite engine is part of the same process that runs the Python interpreter. This means that it has the same handicap as any multithreaded application on CPython (the most common implementation of Python) and will not benefit from any extra cores or processors available on the server.

Now we could switch to MySQL or any other stand-alone database back-end but this would add quite an amount to the maintenance burden of our web application. Wouldn't it be nice if we could devise a way to use SQLite together with the multiprocessing module to have the best of both worlds: the ease of use of SQLite and the performance benefits of a stand-alone database server?

In this series of articles I will explore the possibilities and hopefully will come up with a solution that will provide:

  • a dbapi proxy (we'll use sqlite3 module but it should be general enough for any dbapi compliant database)
  • that will use the multiprocessing module to increase performance and
  • can be used from a multithreaded environment.
It would be nice if the API closely resembles the dbapi (but that is not an absolute requirement).

In the next article in this series I will explore the options to make threads and processes play nice, focusing on inter process communication.

1 comment:

  1. The global interpreter lock does not necessarily apply to C++ code, so it is possible* for SQLite to perform operations while another Python thread is running.

    All the GIL means is that at most one thread can be touching Python objects at a time.

    * I'm not sure it /does/

    ReplyDelete