Finding stuck or hot pixels in Nikon D80 images, a multiprocessing approach

Earlier I described a method to find hot or stuck pixels by determining the variance of (sub)pixels over a large set of photos. In this article we take a look at a way to farm out this work to multiple processes.

The parallel algorithm for calculating variance

The parallel algorithm allows us to combine calculated variances from separate sets. With the help of Python's multiprocessing module it is straight forward to implement our hot pixel finding algorithm to one that takes advantage of multiple cores:

import Image
import numpy as np
import sys
from glob import glob
from multiprocessing import Pool,current_process
from time import time

def process_pixels(shape,*args):
	first = True
	for filename in args:
		pic = Image.open(filename)
		if pix.shape != shape:
			print("shapes don't match")
			continue
		if first:
			first = False
			firstpix = pix
			n = np.zeros(firstpix.shape)
			mean = np.zeros(firstpix.shape)
			M2 = np.zeros(firstpix.shape)
			delta = np.zeros(firstpix.shape)
		n += 1
		delta = pix - mean
		mean += delta/n
		M2 += delta*(pix - mean)
	return M2
	
if __name__ == '__main__':
	global pool

	n=int(sys.argv[1])
	pool = Pool(n)
	filenames = []
	for a in sys.argv[2:]:
		filenames.extend(glob(a))
	
	shape = np.array(Image.open(filenames[0])).shape

	s=time()
	results = []
	for i in range(n):
	results.append(pool.apply_async(process_pixels,tuple([shape]+filenames[i::n])))
	for i in range(n):
		results[i]=results[i].get()
	M2 = sum(results)
	print(time()-s)
	
	mini = np.unravel_index(M2.argmin(),M2.shape)
	maxi = np.unravel_index(M2.argmax(),M2.shape)
	print('min',M2[mini],mini)
	print('max',M2[maxi],maxi)
	print('mean',np.mean(M2))

	sorti = M2.argsort(axis=None)
	print(sep="\n",*[(i,M2[i]) for i in [np.unravel_index(i,M2.shape) for i in sorti[:10]]])

	print(time()-s)

Note that because we return complete arrays (line 26) we gain almost nothing for small data sets because of the overhead of shuttling these large arrays (> 30 MByte) between processes. This is illustrated in the following graph with shows the elapsed time as a function of the number of processes, both for a small number of pictures (50) and a large number (480).

Some other notable issues: in the code we do not actually implement the parallel algorithm but we simple add together the variances. Because we're looking for a minimum variance we gain nothing by adding a constant value.

Memory usage is another thing to be aware of (and the reason that there is no entry for six cores in the graph. The algorithm we have implemented uses 5 arrays (the pixel data itself included). That makes for 10 megapixel X 3 colors X 5 arrays X 8 bytes data (because we use 64 bit floats by default) which makes for a whopping 1.2 Gigabyte of data per process or more than 6 Gig with 5 processes. With some other applications open a sixth process wouldn't fit on my test machine. Because we're adding pixel values in the range 0 - 255 we could probably gain a lot by using 32 bit floats or even 16 bit floats here.