Start Small: 2010

Saturday 25 December 2010

Daemonizing CherryPy

It is documented but without a proper example I found it very hard to find out how to daemonize a CherryPy server. After some trial and error it proved to be not so hard at all.

The CherryPy documentation is a little haphazard at times and although the Daemonizer plugin is documented I found it a bit difficult to understand without any examples. There is a bit more available now in the new documentation but that is quite hidden so it never hurts to show an example:

cherrypy.process.plugins.Daemonizer(cherrypy.engine).subscribe()
...
...
cherrypy.quickstart(Root(),config={
  '/':
  { 'log.access_file' : os.path.join(current_dir,"access.log"),
  'log.screen': False,
  'tools.sessions.on': True
  }})

Note that this only works on UNIX like systems (it certainly won't work on Windows XP). Also note the configuration parameters. Make sure sure you log your accesses explicitly otherwise you will have a hard time finding where the logging of your daemonized process went (hint: probably nowhere...)

Saturday 18 December 2010

Calulating sunrise and sunset in Python

I expected to find dozens of readily available implementations of sunrise and sunset calculations in Python on the web but this turned out to be a disappointment. Therefore I resolved to write my own straightforward implementation.

The sunrise equation is pretty easy to find on Wikipedia but actual implementations are not, certainly not in Python 3.x. (If you are willing to stick to Python 2.x there is of course the excellent PyEphem package, see the end of this article) Fortunately NOAA provides annotated equations in the form of an OpenOffice spreadsheet. This allows for simple re-engineering in Python and gives us a way to verify the results of a new implementation against those in the spreadsheet.

If you save the code below as sunrise.py then calculating the time of sunrise today would be straightforward as shown in this example:

 import datetime
 import sunrise
 s = sun(lat=49,long=3)
 print('sunrise at ',s.sunrise(when=datetime.datetime.now())

The sun class also provides a sunset() method and solarnoon() method. All three methods take a when parameter that should be a datetime.datetime object. If this object contains timezone information or daylight saving time information, this information is used when calculating the times of sunrise, sunset and the solar noon.

Note that if no when parameter is given, a default datetime is used that is initialized with a LocalTimezone object from the timezone module. I have not provided that module here but you can implement one simple enough by copying the example in Python's documentation or you can comment out the import statement below and always supply a when parameter.

from math import cos,sin,acos,asin,tan
from math import degrees as deg, radians as rad
from datetime import date,datetime,time

# this module is not provided here. See text.
from timezone import LocalTimezone

class sun:
 """ 
 Calculate sunrise and sunset based on equations from NOAA
 http://www.srrb.noaa.gov/highlights/sunrise/calcdetails.html

 typical use, calculating the sunrise at the present day:
 
 import datetime
 import sunrise
 s = sun(lat=49,long=3)
 print('sunrise at ',s.sunrise(when=datetime.datetime.now())
 """
 def __init__(self,lat=52.37,long=4.90): # default Amsterdam
  self.lat=lat
  self.long=long
  
 def sunrise(self,when=None):
  """
  return the time of sunrise as a datetime.time object
  when is a datetime.datetime object. If none is given
  a local time zone is assumed (including daylight saving
  if present)
  """
  if when is None : when = datetime.now(tz=LocalTimezone())
  self.__preptime(when)
  self.__calc()
  return sun.__timefromdecimalday(self.sunrise_t)
  
 def sunset(self,when=None):
  if when is None : when = datetime.now(tz=LocalTimezone())
  self.__preptime(when)
  self.__calc()
  return sun.__timefromdecimalday(self.sunset_t)
  
 def solarnoon(self,when=None):
  if when is None : when = datetime.now(tz=LocalTimezone())
  self.__preptime(when)
  self.__calc()
  return sun.__timefromdecimalday(self.solarnoon_t)
  
 @staticmethod
 def __timefromdecimalday(day):
  """
  returns a datetime.time object.
  
  day is a decimal day between 0.0 and 1.0, e.g. noon = 0.5
  """
  hours  = 24.0*day
  h      = int(hours)
  minutes= (hours-h)*60
  m      = int(minutes)
  seconds= (minutes-m)*60
  s      = int(seconds)
  return time(hour=h,minute=m,second=s)

 def __preptime(self,when):
  """
  Extract information in a suitable format from when, 
  a datetime.datetime object.
  """
  # datetime days are numbered in the Gregorian calendar
  # while the calculations from NOAA are distibuted as
  # OpenOffice spreadsheets with days numbered from
  # 1/1/1900. The difference are those numbers taken for 
  # 18/12/2010
  self.day = when.toordinal()-(734124-40529)
  t=when.time()
  self.time= (t.hour + t.minute/60.0 + t.second/3600.0)/24.0
  
  self.timezone=0
  offset=when.utcoffset()
  if not offset is None:
   self.timezone=offset.seconds/3600.0
  
 def __calc(self):
  """
  Perform the actual calculations for sunrise, sunset and
  a number of related quantities.
  
  The results are stored in the instance variables
  sunrise_t, sunset_t and solarnoon_t
  """
  timezone = self.timezone # in hours, east is positive
  longitude= self.long     # in decimal degrees, east is positive
  latitude = self.lat      # in decimal degrees, north is positive

  time  = self.time # percentage past midnight, i.e. noon  is 0.5
  day      = self.day     # daynumber 1=1/1/1900
 
  Jday     =day+2415018.5+time-timezone/24 # Julian day
  Jcent    =(Jday-2451545)/36525    # Julian century

  Manom    = 357.52911+Jcent*(35999.05029-0.0001537*Jcent)
  Mlong    = 280.46646+Jcent*(36000.76983+Jcent*0.0003032)%360
  Eccent   = 0.016708634-Jcent*(0.000042037+0.0001537*Jcent)
  Mobliq   = 23+(26+((21.448-Jcent*(46.815+Jcent*(0.00059-Jcent*0.001813))))/60)/60
  obliq    = Mobliq+0.00256*cos(rad(125.04-1934.136*Jcent))
  vary     = tan(rad(obliq/2))*tan(rad(obliq/2))
  Seqcent  = sin(rad(Manom))*(1.914602-Jcent*(0.004817+0.000014*Jcent))+sin(rad(2*Manom))*(0.019993-0.000101*Jcent)+sin(rad(3*Manom))*0.000289
  Struelong= Mlong+Seqcent
  Sapplong = Struelong-0.00569-0.00478*sin(rad(125.04-1934.136*Jcent))
  declination = deg(asin(sin(rad(obliq))*sin(rad(Sapplong))))
  
  eqtime   = 4*deg(vary*sin(2*rad(Mlong))-2*Eccent*sin(rad(Manom))+4*Eccent*vary*sin(rad(Manom))*cos(2*rad(Mlong))-0.5*vary*vary*sin(4*rad(Mlong))-1.25*Eccent*Eccent*sin(2*rad(Manom)))

  hourangle= deg(acos(cos(rad(90.833))/(cos(rad(latitude))*cos(rad(declination)))-tan(rad(latitude))*tan(rad(declination))))

  self.solarnoon_t=(720-4*longitude-eqtime+timezone*60)/1440
  self.sunrise_t  =self.solarnoon_t-hourangle*4/1440
  self.sunset_t   =self.solarnoon_t+hourangle*4/1440

if __name__ == "__main__":
 s=sun(lat=52.37,long=4.90)
 print(datetime.today())
 print(s.sunrise(),s.solarnoon(),s.sunset())

For people willing to stick to Python 2.x there is a simple and good alternative in the form of the PyEphem package. It can do a lot more than just calculating sunsise. An example is shown below.

import ephem
o=ephem.Observer()
o.lat='49'
o.long='3'
s=ephem.Sun()
s.compute()
print ephem.localtime(o.next_rising(s))

Thursday 16 December 2010

Python Geospatial Development

Every once in a while a book appears that immediately captures your attention. Python Geospatial Development by Erik Westra is such a book and the coming time I am sure I'll have an enjoyable time reading it. I'll post a full review when I am finished.

Done: Check this article.

Monday 13 December 2010

Python thread safe cache class

Every so often the need arises to create a thread safe cache solution. This is my stab at a simple yet fully functional implementation that maintains the essential dictionary semantics, is thread safe and has a fixed, configurable size, for example in a multithreaded http server like CherryPy.

Although many dictionary operation like getting an item are reported to be atomic and therefore thread safe, this is actually an implementation specific feature of the widely used CPython implementation. And even so, adding keys or iterating over the keys in the dictionary might not be thread safe at all. We must therefore use some sort of locking mechanism to ensure no two threads try to modify the cache at the same time. (For more information check this discussion.)

The Cache class shown here features a configurable size and if the number of entries is too big it removes the oldest entry. We do not have to maintain a explicit usage administration for that because we make use of the properties of the OrderedDict class which remembers the order in which keys are inserted and sports a popitem() method that will remove the first (or last) item inserted.

from collections import OrderedDict
from threading import Lock

class Cache:
    def __init__(self,size=100):
        if int(size)<1 :
            raise AttributeError('size < 1 or not a number')
        self.size = size
        self.dict = OrderedDict()
        self.lock = Lock()

    def __getitem__(self,key):
        with self.lock:
            return self.dict[key]

    def __setitem__(self,key,value):
        with self.lock:
            while len(self.dict) >= self.size:
                self.dict.popitem(last=False)
            self.dict[key]=value

    def __delitem__(self,key):
        with self.lock:
            del self.dict[key]

Due to the functionality of the OrderedDict class we use, the implementation is very concise. The __init__() method merely checks whether the size attribute makes any sense and creates an instance of an OrderedDict and a Lock.

The with statements used in the remaining methods wait for the acquisition of the lock and guarantee that the lock is released even if an exception is raised. The __getitem__() method merely tries to retrieve a value by trying the key on the ordered dictionary after acquiring a lock.

The __setitem__() method removes as many items within its while loop to reduce the size to below the preset amount and then adds the new value. The popitem() method of an OrderedDict removes the least recently added key/value pair if it's last argument is set to False.

The __delitem__() also merely passes on the control to the underlying dictionary. Together these methods allow for any instance of our Cache class to be used like any other dictionary as the example code below illustrates:

>>> from cache import Cache
>>> c=Cache(size=3)
>>> c['key1']="one"
0
>>> c['key2']="two"
1
>>> c['key3']="three"
2
>>> c['key4']="four"
3
>>> c['key4']
'four'
>>> c['key1']
Traceback (most recent call last):
  File "", line 1, in 
  File "cache.py", line 13, in __getitem__
    return self.dict[key]
KeyError: 'key1'

Of course this doesn't show off the thread safety but it does show that the semantics are pretty much like that of a regular dictionary. If needed this class even be extended with suitable iterators/view like keys() and items() but for most caches this probably isn't necessary.

Sunday 14 November 2010

Python metaclasses

A basic object relational design

In this article we will explore metaclasses as a tool to link up Python classes and database tables.

Based in part on the information in illustrated in this
Python article.

Metaclasses are a daunting concept at first, yet it is worthwhile to understand the idiom because in some circumstances the are vital in getting hard jobs done. In this short article we'll look into a rough draft of some code that creates database tables at the same time a class is created and alters the class definition in such a way that accessing instance variables will result in calls to some database engine.

The first thing to understand is that Python metaclasses are not magic. Everytime you define a new class you already refer to the built-in metaclass type. Metaclasses are subclasses of type and regular classes are instances of a metaclass. This may sound weird but remember that everything in Python is an object, even a class definition. And because every object is an instance of some class, this is just a logical extension of a general concept.

If you want your class definition to be instanced by a different metaclass than type you will have to use the metaclass parameter in your class definition:

class MyClass(metaclass=MyMetaClass):
 ...

Your metaclass must be a subclass from Python's built-in metaclass type and should at least provide a __new__() method:

class MyMetaClass(type):
 def __new__(metaclass, classname, bases, classdict):
  ... do stuff to classDict ...
  return type.__new__(metaclass, classname, bases, classdict)

This __new__() method should return a new class, for example by passing its attribute to the __new__() method of type like we do in the example above. The real power is hidden in the arguments passed to the __new__() method: besides the name of the class we are building and the base classes it will be a subclass of, we also have access to the class dictionary.

The class dictionary holds all class attributes, including class variables and methods (both class methods and instance methods). The beauty is that we can check and/or alter the contents of this class dictionary before we actually create a class. This could be used for example to check whether the class has some mandatory method definitions or overrides specific methods in its bases, something that is used to implement abstract base classes in Python.

Another application of metaclasses is to bridge domains. In the example below there are two domains: the classes and instances of those classes in Python's runtime environment and the tables and records in a relational database stored on disk. If we would like to map those classes to database tables metaclasses provide us with some excellent tooling because they allow us to act upon the class definition before the class is actually made to exist.

This means that we when we define a class our metaclass can check whether there is already a suitable table defined in the database and that class variables are mapped to columns in this table. It is also possible to change class variables into properties in such a way that accessing these properties from instances will result in proper sql statements issues to a database engine.

This may sound a bit abstract, so just take a look at the code below. It will not really interact with a database but just print the sql it would have used. The comments should explain what is happening in a fairly detailed manner:

from functools import partial
from random import randint

class Attribute:
 """
 Attribute makes it possible to distinguish class
 variables that should be backed by a columns definition
 for a table.
 """
 def __init__(self,constraints=''):
  self.constraints=constraints
  
class DBbackend(type):
 """
 A metaclass that will create a database table that
 contains columns for each class variable that is an
 instance of Attribute. It also replaces these class
 variables with poperties that will retrieve or update
 the database value when the attribute is accessed.
 """
 def __new__(meta, classname, bases, classDict):
  attributes = {}
  # create a suitable table definition. Although we
  # do not do a complete implementation here, we have
  # sqlite in mind, so no explicit types.
  for attr in classDict:
   if issubclass(Attribute,classDict[attr].__class__):
    attributes[attr]=classDict[attr]
  sql = 'create table if not exists %s ( %s )' % ( classname, 
   ",".join(['id integer primary key autoincrement']
     +[ name+" "+a.constraints for name,a in attributes.items()]))
   
  print(sql)
  
  # we alter the __init__() method to create a database record
  def create(self,**kw):
   self.id=randint(1,1000000) # just for illustration, normally handled by autoincrement in the db
   sql='insert into %s (%s) values (%s) [%s]' % (self.__class__.__name__,
    ",".join(kw.keys()),
    ",".join(['?']*len(kw.keys())),
    ",".join(kw.values()))
   print(sql)
   
  classDict['__init__']=create
  
  # functions that retrieve/update a column in a database table
  def get(self,name):
   print('select %s from %s where id = ? [%s]' % (name,
      classname,str(self.id)))
  
  def set(self,value,name):
   print('update %s set %s=? [%s] where id = ? [%s]' % (classname,
      name,str(value),str(self.id)))
  
  # change each class var that holds an Attribute object to a 
  # property that get/sets the appropriate column
  for attr in attributes:
   fget = partial(get,name=attr)
   fset = partial(set,name=attr)
   classDict[attr]=property(fget,fset)
  
  return type.__new__(meta, classname, bases, classDict)

if __name__ == "__main__":

 # example, create a class with three attributes/database columns
 class Car(metaclass=DBbackend):
  make = Attribute()
  model= Attribute()
  license=Attribute('unique')

 # create an instance
 mycar = Car(make='Volvo', model='C30', license='1-abc-23')

 # retrieve various attributes
 model = mycar.model
 make = mycar.make
 lic = mycar.license

 # set an attribute
 mycar.model='S40'

Sunday 31 October 2010

The universal feedparser

Sometimes things don't have to be difficult. For example, incorporating a RSS feed summary in another website. With the universal feedparser this takes only a few lines of code.

The universal feedparser is a old but venerable piece of software that makes it a walk in the park to retrieve information from an RSS or Atom feed in Python.

The following code is what I actually use to incorporate the title and first paragraph of the postings on this blog on my homepage.

import feedparser
import re

firstp=re.compile(r'^.*?<p>(.*?)</p>')

url="http://michelanders.blogspot.com/feeds/posts/default"
feed=feedparser.parse(url)

print '<h2><a href="%s">%s</a></h2>'%(feed.feed.link,feed.feed.title)
print '<div class="feeditemlist">'
for e in feed.entries:
        mo=firstp.search(e.description)
        short=mo.group(1) if not mo is None else 'no summary available'
        date="-".join(map(str,e.updated_parsed[:3]))
        summary='<p>%s</p><a href="%s">read more</a>'%(short,e.link)
        print '''
        <h3><a href="#">
        <span class="feeditemdate">%s</span>
        <span class="feeditemtitle">%s</span>
        </a></h3>
        <div class="feeditemsummary">%s</div>'''%(date,e.title,summary)
print '</div>'

The html this script produces is easily readable by itself but can simply be converted to a jQueryUI Accordion widget to save space.

The only drawback is that the universal feed parser only works with Python 2.x

Sunday 10 October 2010

Sqlite multithreading woes

Registering a user defined regexp function with Sqlite from CherryPy

Multithreading can be tricky and can actually trip you in unexpected ways. And in some situations multithreading is almost unavoidable, for example when using CherryPy as it usually instantiates a fair number of threads to efficiently serve multiple HTTP requests in parallel. The situation that caught me unawares was the combination with Sqlite.

Sqlite can be used in a multithreaded fashion but you must make sure that each thread has it's own connection object to communicate with the database. This is easily accomplished by registering a function with CherryPy that will be called for each newly started thread. This might look as follows:

import sqlite3
import threading

data=None
db='/tmp/example.db'

def initdb():
    global data,db
    sql='create table if not exists mytable (col_a, col_b);'
    conn=sqlite3.connect(db)
    c = conn.cursor()
    c.execute(sql)
    conn.commit()
    conn.close() 
    data=threading.local()  

def connect(thread_index): 
    global data,db 
    data.conn = sqlite3.connect(db) 
    data.conn.row_factory = sqlite3.Row

if __name__ == "__main__": 
    initdb() 
    cherrypy.engine.subscribe('start_thread', connect)  
    <... code to start the cherrypy engine ...>

There are two functions in the example above. The first one, initdb(), is used to initialize the database, that is, to create any tables necessary if they are not defined yet and to prepare some storage that is unique for each thread. Normally, all global data is shared between threads so we have to take special measures to provide each thread with its own private data. This is accomplished by the call to threading.local(). The resulting object can be used to store data as attributes and this data is private to each thread. initdb() needs to be called only once before starting the CherryPy engine.

The second function, connect(), should be called once for every thread. It creates a database connection and stores a reference to this connection in the conn attribute of the global variable data. Because this was setup to be private data for each thread, we can use it to store a separate connection object.

In the main section of the code we simply call initdb() once and use the cherrypy.engine.subscribe() function to register our connect() function to be executed at the start of a new thread. The code to actually start CherryPy is not shown in this example.

User defined functions

Now how can this simple setup cause any troubles? Well, most database configuration actions in Sqlite are performed on connection objects and when we want them to work in a consistent way we should apply them to each and every connection. In other words, those configuration actions should be part of the connect() function. An example of that is shown in the last line of the connect() function where we assign a sqlite3.Row factory to the row_factory attribute of a connection object. Because we do it here we make sure that we may consistently access columns by name in any record returned from a query.

What I failed to do and what prompted this post was register a user defined function for each connection. Somehow it seemed logical to do it only once when initializing the database, but even if that connection wasn't closed it was impossible to use that function in a query. And user defined functions are not a luxury but a bare necessity if you want to use regular expressions in Sqlite!

Sqlite supports the REGEXP operator in queries so you may use a query like:



select * from mytable where a regexp '^a.*b$';

This will select any record that has a value in its a column that starts with an a and ends with a b. However, although the syntax is supported, it still raises a Sqlite3.OperationalError exception because the regexp function that is called by the regexp operator is not defined. If we want to use regular expressions in Sqlite we have to supply an implementation of the regexp function ourselves. Fortunately this is quite simple, a possible implementation is shown below:

import re

def regex(pattern,string):
    if string is None : string = ''
    return re.search(pattern,str(string))!=None

Note that this isn't a very efficient implementation as we compile a pattern again and again each time the function is called even when it may be called hundreds of times with the same pattern in a single query. It does the job however.

All that is left to do now, is register this function. Not, as I did, as part of the initdb() function, but as part of the connect() function that is called for each thread:

def connect(thread_index):
    global data,db 
    data.conn = sqlite3.connect(db)
    data.conn.row_factory = sqlite3.Row
    data.conn.create_function('regexp',2,regex)

The create_function() method will make our newly defined function available. It takes a name, the number of arguments and a reference to our new function as arguments. Note that despite what the Sqlite documentation states, our regular expression function should be registered with the name regexp (not regex!).

A side note on multiprocessing

If you have a multiprocessor or multicore machine, multithreading will in general not help you to tap into the full processing power of your server. In this article I explore ways to use Python's multiprocessing module in combination with Sqlite.

Thursday 30 September 2010

Starting a stand alone webapp

The concept of a web application is broader than you might guess: using the webbrowser as a graphical user interface (GUI) makes sense even if you are just running a stand alone application on a local machine. After all, every machine has a browser intalled nowadays, so using this as a GUI might save you from headaches trying to find a cross platform GUI toolkit that looks good and is familiar to the user.

In order to use the browser as a user interface we have to start it up in a platform independent way, start a web application framework serving our application at the same time and make sure that the new web browser window points to the correct location. This sounds like a lot of work but in Python this is actually rather straightforward.

Let's have a look at the following code:
import cherrypy import webbrowser import threading def openbrowser(): webbrowser.open('http://127.0.0.1:8080') class Root(object): @cherrypy.expose def index(self): return 'Hi There' if __name__ == "__main__": threading.Timer(3.0,openbrowser).start() cherrypy.quickstart(Root(),config={})
If you save the code above as webapp.py and you have CherryPy installed, you can start the program by typing the following in a terminal (or dos-box):
python webapp.py
It will start up a webserver running on your local machine that listens on port 8080. It will also start up a new browser window and direct it to http://127.0.0.1:8080. The Python version is not relevant here as we do not use any 3.x specific constructs.

The trick is to utilize Python's bundled module webbrowser to open a browser window in a cross platform compatible way as implemented in the openbrowser() function. We do not call this function right away though, because the browser then might start before there is a webserver running. We cannot start CherryPy first either, because the quickstart() function does not return. Therefor we instantiate a Timer object from the threading module and tell it to call our openbrowser() function after three seconds, which should be plenty of time for the CherryPy server to start.