If we use annotations to sanitize incoming data easy access to regular expressions is a must. In this article I'll show how to make the most of Python's bundles functools
module to dynamically generate functions that match regular expressions while reducing the overhead of compiling regular expressions as much as possible
Matching against regular expressions
In our framework we annotate function parameters with functions that will pass through or convert incoming arguments if they are ok or raise an exception other wise. For example, if we would want to make sure an argument is an integer, we'd annotate it with Python's built-in int
function:
def myfunction(arg:int): ...
The framework will use this function to check the argument before actually calling the function.
In many situations it would be convenient if we could specify a regular expression that arguments should match. We could extend the framework to check whether the annotation was a string and use that string as a regular expression but it is better to opt for a approach that will allow for the easy definition of categories. Some examples: def myfunc(arg:digits) ...
or def myfunc(arg:zipcode) ...
. In these examples digits
and zipcode
are functions that take single argument and return that argument if matches a certain pattern or raise an exception if it doesn't.
To construct such functions easily we may utilize Python's re
and functools
modules:
from re import compile def regex_compile(pattern): return compile("^"+pattern+"$") def match(pattern,string): re = regex_compile(pattern) if re.match(string) is None: raise ValueError("no pattern match") return string
To check whether a string matches a regular expression we define a match
function (line 6) that will compile the regular expression and match the string against it. The regex_compile
function will make sure the match is anchored at the beginning and the end because we want a complete match, that is string that merely contain the pattern are considered invalid. We don't allow extraneous characters.
Constructing partial functions
So far nothing new under the sun but we still need a simple way to construct functions that match a single argument against a regular expression. Enter the partial
function. It will take a function and some arguments and use those to construct a new function that will pass any arguments together with the original arguments to the encapsulated function. This is exactly what we need to construct our categories:
from functools import partial digits = partial(match,r'\d+') zipcode = partial(match,r'\d{4}\s*[a-zA-Z]{2}') print(digits('12345')) print(zipcode('1234AB'))
Improving efficiency by Memoization
Each time we call digits(a)
we actually call match(r'\d+',a)
and this will result in compile the regular expression again and again. Compiling regular expressions is a rather expensive operation so we might want to avoid that by reusing compiled expressions. This can simply be accomplished by applying the lru_cache
decorator from the functools
module to implement a memoize pattern:
from functools import lru_cache from re import compile @lru_cache(maxsize=0) def regex_compile(pattern): return compile("^"+pattern+"$")
This simple change will make sure we will reuse compiled regular expressions most of the time. The maxsize
parameter should be set as needed: it may be set to zero in which case all regular expression will be cached, possibly a good choice as it is unlikely that your web application sports thousands of regular expressions.