Finding stuck or hot pixels in Nikon D80 images, a multiprocessing approach

Earlier I described a method to find hot or stuck pixels by determining the variance of (sub)pixels over a large set of photos. In this article we take a look at a way to farm out this work to multiple processes.

The parallel algorithm for calculating variance

The parallel algorithm allows us to combine calculated variances from separate sets. With the help of Python's multiprocessing module it is straight forward to implement our hot pixel finding algorithm to one that takes advantage of multiple cores:

  1. import Image  
  2. import numpy as np  
  3. import sys  
  4. from glob import glob  
  5. from multiprocessing import Pool,current_process  
  6. from time import time  
  7.   
  8. def process_pixels(shape,*args):  
  9.     first = True  
  10.     for filename in args:  
  11.         pic = Image.open(filename)  
  12.         if pix.shape != shape:  
  13.             print("shapes don't match")  
  14.             continue  
  15.         if first:  
  16.             first = False  
  17.             firstpix = pix  
  18.             n = np.zeros(firstpix.shape)  
  19.             mean = np.zeros(firstpix.shape)  
  20.             M2 = np.zeros(firstpix.shape)  
  21.             delta = np.zeros(firstpix.shape)  
  22.         n += 1  
  23.         delta = pix - mean  
  24.         mean += delta/n  
  25.         M2 += delta*(pix - mean)  
  26.     return M2  
  27.       
  28. if __name__ == '__main__':  
  29.     global pool  
  30.   
  31.     n=int(sys.argv[1])  
  32.     pool = Pool(n)  
  33.     filenames = []  
  34.     for a in sys.argv[2:]:  
  35.         filenames.extend(glob(a))  
  36.       
  37.     shape = np.array(Image.open(filenames[0])).shape  
  38.   
  39.     s=time()  
  40.     results = []  
  41.     for i in range(n):  
  42.     results.append(pool.apply_async(process_pixels,tuple([shape]+filenames[i::n])))  
  43.     for i in range(n):  
  44.         results[i]=results[i].get()  
  45.     M2 = sum(results)  
  46.     print(time()-s)  
  47.       
  48.     mini = np.unravel_index(M2.argmin(),M2.shape)  
  49.     maxi = np.unravel_index(M2.argmax(),M2.shape)  
  50.     print('min',M2[mini],mini)  
  51.     print('max',M2[maxi],maxi)  
  52.     print('mean',np.mean(M2))  
  53.   
  54.     sorti = M2.argsort(axis=None)  
  55.     print(sep="\n",*[(i,M2[i]) for i in [np.unravel_index(i,M2.shape) for i in sorti[:10]]])  
  56.   
  57.     print(time()-s)  

Note that because we return complete arrays (line 26) we gain almost nothing for small data sets because of the overhead of shuttling these large arrays (> 30 MByte) between processes. This is illustrated in the following graph with shows the elapsed time as a function of the number of processes, both for a small number of pictures (50) and a large number (480).

Some other notable issues: in the code we do not actually implement the parallel algorithm but we simple add together the variances. Because we're looking for a minimum variance we gain nothing by adding a constant value.

Memory usage is another thing to be aware of (and the reason that there is no entry for six cores in the graph. The algorithm we have implemented uses 5 arrays (the pixel data itself included). That makes for 10 megapixel X 3 colors X 5 arrays X 8 bytes data (because we use 64 bit floats by default) which makes for a whopping 1.2 Gigabyte of data per process or more than 6 Gig with 5 processes. With some other applications open a sixth process wouldn't fit on my test machine. Because we're adding pixel values in the range 0 - 255 we could probably gain a lot by using 32 bit floats or even 16 bit floats here.