Start Small: PIL

Showing posts with label PIL. Show all posts

Finding stuck or hot pixels in Nikon D80 images, a multiprocessing approach

Earlier I described a method to find hot or stuck pixels by determining the variance of (sub)pixels over a large set of photos. In this article we take a look at a way to farm out this work to multiple processes.

The parallel algorithm for calculating variance

The parallel algorithm allows us to combine calculated variances from separate sets. With the help of Python's multiprocessing module it is straight forward to implement our hot pixel finding algorithm to one that takes advantage of multiple cores:

import Image
import numpy as np
import sys
from glob import glob
from multiprocessing import Pool,current_process
from time import time

def process_pixels(shape,*args):
	first = True
	for filename in args:
		pic = Image.open(filename)
		if pix.shape != shape:
			print("shapes don't match")
			continue
		if first:
			first = False
			firstpix = pix
			n = np.zeros(firstpix.shape)
			mean = np.zeros(firstpix.shape)
			M2 = np.zeros(firstpix.shape)
			delta = np.zeros(firstpix.shape)
		n += 1
		delta = pix - mean
		mean += delta/n
		M2 += delta*(pix - mean)
	return M2
	
if __name__ == '__main__':
	global pool

	n=int(sys.argv[1])
	pool = Pool(n)
	filenames = []
	for a in sys.argv[2:]:
		filenames.extend(glob(a))
	
	shape = np.array(Image.open(filenames[0])).shape

	s=time()
	results = []
	for i in range(n):
	results.append(pool.apply_async(process_pixels,tuple([shape]+filenames[i::n])))
	for i in range(n):
		results[i]=results[i].get()
	M2 = sum(results)
	print(time()-s)
	
	mini = np.unravel_index(M2.argmin(),M2.shape)
	maxi = np.unravel_index(M2.argmax(),M2.shape)
	print('min',M2[mini],mini)
	print('max',M2[maxi],maxi)
	print('mean',np.mean(M2))

	sorti = M2.argsort(axis=None)
	print(sep="\n",*[(i,M2[i]) for i in [np.unravel_index(i,M2.shape) for i in sorti[:10]]])

	print(time()-s)

Note that because we return complete arrays (line 26) we gain almost nothing for small data sets because of the overhead of shuttling these large arrays (> 30 MByte) between processes. This is illustrated in the following graph with shows the elapsed time as a function of the number of processes, both for a small number of pictures (50) and a large number (480).

Some other notable issues: in the code we do not actually implement the parallel algorithm but we simple add together the variances. Because we're looking for a minimum variance we gain nothing by adding a constant value.

Memory usage is another thing to be aware of (and the reason that there is no entry for six cores in the graph. The algorithm we have implemented uses 5 arrays (the pixel data itself included). That makes for 10 megapixel X 3 colors X 5 arrays X 8 bytes data (because we use 64 bit floats by default) which makes for a whopping 1.2 Gigabyte of data per process or more than 6 Gig with 5 processes. With some other applications open a sixth process wouldn't fit on my test machine. Because we're adding pixel values in the range 0 - 255 we could probably gain a lot by using 32 bit floats or even 16 bit floats here.

Finding stuck or hot pixels in Nikon D80 images

In a previous article I presented a tiny Python program that used the Python Image Library (PIL) to correct a so called stuck or hot pixel in an image taken by a digital SLR. In this article we go one step further and see if we can use some simple statistics to identify those pixels automatically.

Finding pixels with the least variance

The idea is to identify those pixels in a bunch of pictures that don't change much from picture to picture, even if the subject and lighting conditions do change. There are of course many ways to define change but the one we explore here is the called the variance. Basically we compute for each pixel in a set of pictures its average and then sum (again for each pixel) the differences with its average in each picture. The pixels (or subpixels) that have the smallest sums are likely candidates for being hot or stuck.

Using PIL and Numpy for efficient number crunching

Obviously calculating the variance for each subpixel in a few hundred pictures will entail some serious number crunching if these pictures are from a megapixel camera. We therefore better use a serious number crunching library, like Numpy. Because we use Python 3, I recommend fetching the PIL and Numpy packages from Christoph Gohlke's page if you use Windows.

The code is shown below. The program will open all images given as arguments one by one using PIL's Image.open() function (line 9). PIL images can be directly converted by Numpy by the array() function. Because we might run out of memory we do not keep all images in memory together but process them one by one and calculate the variance using the so called on-line algorithm. The names of the variables used are the same as in the Wikipedia article but instead of scalars we use arrays. (Note, we do not actually calculate the variance but just the sum of squares of differences from the mean because we interested in the pixel with smallest variance whatever that may be and not in the value of the variance at such.) In the final lines (line 27) we print out the results, retrieving the index of the lowest variance with Numpy's argmin() function.

import Image
import numpy as np
import sys
from glob import glob

first = True
for arg in sys.argv[1:]:
	for filename in glob(arg):
		pic = Image.open(filename)
		pix = np.array(pic)
		if first:
			first = False
			firstpix = pix
			n = np.zeros(firstpix.shape)
			mean = np.zeros(firstpix.shape)
			M2 = np.zeros(firstpix.shape)
			delta = np.zeros(firstpix.shape)
		else:
			if pix.shape != firstpix.shape:
				print("shapes don't match")
				continue
		n += 1
		delta = pix - mean
		mean += delta/n
		M2 += delta*(pix - mean)

mini = np.unravel_index(M2.argmin(),M2.shape)
print('min',M2[mini],mini)

Results and limitations

Using a few hundred photo's I could easily identify a previously observed hot pixel. It did take a few minutes though, even on my pc which is a fairly powerful hexacore machine. Besides speed the biggest limitation is that I had to use JPEG images because I did not have RAW images available. JPEG image compression isn't lossless so the hot pixel tends to be smeared out. If you print out not just the pixel with the lowest variance but the ten pixels with the lowest variance, most of them are clustered around the same spot. (example code below).

sorti = M2.argsort(axis=None)
print(sep="\n",*[(i,M2[i]) for i in [np.unravel_index(i,M2.shape) for i in sorti[:10]]])

Correcting stuck or hot pixels in Nikon D80 images

A friend of mine became suddenly aware that her Nikon D80 camera had developed a so called stuck or hot pixel. Repairing a CCD chip in a camera is quite costly and what is more, looking back through the old photos she discovered the defect had been present for over a year. In this article we develop a tiny Python program that masks out a pixel in pretty much the same way that the built-in firmware in de camera does this for pixels that were hot or dead when the camera was constructed.

Correcting stuck or hot pixels in images

Hot or stuck pixels in a camera are annoying and other than replacing the ccd chip, nothing can be done about it with regard to the hardware. More on this in this illuminating article. Sometimes the camera can map out bad pixels but that won't fix old images, so the plan is to create a small program that does this on existing images.

The idea is to replace the offending pixel with a weighted average of the neighboring pixels so it will blend in with the surroundings. Diagonally adjacent pixels have a lower weight then pixels horizontally or vertically adjacent (see picture on the left). Such a program is simple enough to implement using PIL, the Python Imaging Library. There is even a version for Python 3 made available by Christoph Gohlke, although you will have to replace all relative import statements with absolute one if you want that to work. (Simply run the the installer, open all *.py files in site-packages/PIL and replace all occurrences of from . import with a simple import. A decent editor like notepad++ or ultraedit can manage that in one go).

The program will take the name of an image file and the x,y position of the pixel to fix as its arguments, e.g. python correctpixel.py myimage.jpg 1500,1271 and will produce a corrected image with the same name but with a c_ prefix added, in our example c_myimage.jpg. It will use a weighted average of the eight neighboring pixels as depicted in this image (the diagonally adjacent pixels have a weight of 1/sqrt(2).

The simplest way to find the exact location of the pixel to fix is to open the image in Gimp, locate the pixel and hover over it with the mouse: the location is shown in the status bar at the bottom. Depending on your zoom level these may be fractional numbers, but we need just the integral parts.

The program itself is rather straightforward and is shown in its entirety below:

import sys
import Image

def correct(im,xy,matrix):
 pim = im.load()
 x,y = xy
 maxx,maxy = im.size
 n = 0
 sumr,sumg,sumb = 0,0,0
 for dx,dy,w in matrix:
  px,py = x+dx,y+dy
  if px<0 or py<0 or px >= maxx or py >= maxy:
   break
  n += w
  r,g,b = pim[px,py]
  sumr,sumg,sumb = sumr+r*w,sumg+g*w,sumb+b*w
 pim[x,y]=(int(sumr/n),int(sumg/n),int(sumb/n))

w=1/(2**0.5)
matrixd=((0,1,1),(0,-1,1),(1,0,1),(-1,0,1),(-1,-1,w),(-1,1,w),(1,1,w),(1,-1,w))

im = Image.open(sys.argv[1])
xy = tuple(map(int,sys.argv[2].split(',')))
correct(im,xy,matrixd)
im.save('c_'+sys.argv[1],quality=97)

Given an opened image the function correct() will load the data and the loop of the list of pixels offsets and weights given in its matrix argument. It will check whether a neighbor is within the bounds of the image (line 12; if the stuck pixel is on an edge it might not be) and if so, sum its RGB components according to the weight of this pixel. The pixel is finally replaced by summed values divided by the number of summed pixels. We also take care to convert the RGB components to integers again (line 17Y).

The final lines take care of opening the image (line 22), and converting the pixel position argument to a tuple of integers before calling the correct() function. The image is saved again with a name prefixed with c_. The quality parameter is set to a very high value to more or less save the image with the same jpeg quality a the original image.

Note that at the moment the code shown works for jpeg images only (or to be precise, only for images with three bands, R,G,B.) Png images may have and extra transparency band and the code does not deal with that nor does it play nice with indexed formats (like gif).