Earlier I described a method to find hot or stuck pixels by determining the variance of (sub)pixels over a large set of photos. In this article we take a look at a way to farm out this work to multiple processes.
The parallel algorithm for calculating variance
The
parallel algorithm
allows us to combine calculated variances from separate sets. With the help of Python's multiprocessing
module it is straight forward to implement our hot pixel finding algorithm to one that takes advantage of multiple cores:
import Image import numpy as np import sys from glob import glob from multiprocessing import Pool,current_process from time import time def process_pixels(shape,*args): first = True for filename in args: pic = Image.open(filename) if pix.shape != shape: print("shapes don't match") continue if first: first = False firstpix = pix n = np.zeros(firstpix.shape) mean = np.zeros(firstpix.shape) M2 = np.zeros(firstpix.shape) delta = np.zeros(firstpix.shape) n += 1 delta = pix - mean mean += delta/n M2 += delta*(pix - mean) return M2 if __name__ == '__main__': global pool n=int(sys.argv[1]) pool = Pool(n) filenames = [] for a in sys.argv[2:]: filenames.extend(glob(a)) shape = np.array(Image.open(filenames[0])).shape s=time() results = [] for i in range(n): results.append(pool.apply_async(process_pixels,tuple([shape]+filenames[i::n]))) for i in range(n): results[i]=results[i].get() M2 = sum(results) print(time()-s) mini = np.unravel_index(M2.argmin(),M2.shape) maxi = np.unravel_index(M2.argmax(),M2.shape) print('min',M2[mini],mini) print('max',M2[maxi],maxi) print('mean',np.mean(M2)) sorti = M2.argsort(axis=None) print(sep="\n",*[(i,M2[i]) for i in [np.unravel_index(i,M2.shape) for i in sorti[:10]]]) print(time()-s)
Note that because we return complete arrays (line 26) we gain almost nothing for small data sets because of the overhead of shuttling these large arrays (> 30 MByte) between processes. This is illustrated in the following graph with shows the elapsed time as a function of the number of processes, both for a small number of pictures (50) and a large number (480).
Some other notable issues: in the code we do not actually implement the parallel algorithm but we simple add together the variances. Because we're looking for a minimum variance we gain nothing by adding a constant value.
Memory usage is another thing to be aware of (and the reason that there is no entry for six cores in the graph. The algorithm we have implemented uses 5 arrays (the pixel data itself included). That makes for 10 megapixel X 3 colors X 5 arrays X 8 bytes data (because we use 64 bit floats by default) which makes for a whopping 1.2 Gigabyte of data per process or more than 6 Gig with 5 processes. With some other applications open a sixth process wouldn't fit on my test machine. Because we're adding pixel values in the range 0 - 255 we could probably gain a lot by using 32 bit floats or even 16 bit floats here.