## Sunday, 28 August 2011

### Finding stuck or hot pixels in Nikon D80 images

In a previous article I presented a tiny Python program that used the Python Image Library (PIL) to correct a so called stuck or hot pixel in an image taken by a digital SLR. In this article we go one step further and see if we can use some simple statistics to identify those pixels automatically.

## Finding pixels with the least variance

The idea is to identify those pixels in a bunch of pictures that don't change much from picture to picture, even if the subject and lighting conditions do change. There are of course many ways to define change but the one we explore here is the called the variance. Basically we compute for each pixel in a set of pictures its average and then sum (again for each pixel) the differences with its average in each picture. The pixels (or subpixels) that have the smallest sums are likely candidates for being hot or stuck.

## Using PIL and Numpy for efficient number crunching

Obviously calculating the variance for each subpixel in a few hundred pictures will entail some serious number crunching if these pictures are from a megapixel camera. We therefore better use a serious number crunching library, like Numpy. Because we use Python 3, I recommend fetching the PIL and Numpy packages from Christoph Gohlke's page if you use Windows.

The code is shown below. The program will open all images given as arguments one by one using PIL's `Image.open()` function (line 9). PIL images can be directly converted by Numpy by the `array()` function. Because we might run out of memory we do not keep all images in memory together but process them one by one and calculate the variance using the so called on-line algorithm. The names of the variables used are the same as in the Wikipedia article but instead of scalars we use arrays. (Note, we do not actually calculate the variance but just the sum of squares of differences from the mean because we interested in the pixel with smallest variance whatever that may be and not in the value of the variance at such.) In the final lines (line 27) we print out the results, retrieving the index of the lowest variance with Numpy's `argmin()` function.

```import Image
import numpy as np
import sys
from glob import glob

first = True
for arg in sys.argv[1:]:
for filename in glob(arg):
pic = Image.open(filename)
pix = np.array(pic)
if first:
first = False
firstpix = pix
n = np.zeros(firstpix.shape)
mean = np.zeros(firstpix.shape)
M2 = np.zeros(firstpix.shape)
delta = np.zeros(firstpix.shape)
else:
if pix.shape != firstpix.shape:
print("shapes don't match")
continue
n += 1
delta = pix - mean
mean += delta/n
M2 += delta*(pix - mean)

mini = np.unravel_index(M2.argmin(),M2.shape)
print('min',M2[mini],mini)
```

## Results and limitations

Using a few hundred photo's I could easily identify a previously observed hot pixel. It did take a few minutes though, even on my pc which is a fairly powerful hexacore machine. Besides speed the biggest limitation is that I had to use JPEG images because I did not have RAW images available. JPEG image compression isn't lossless so the hot pixel tends to be smeared out. If you print out not just the pixel with the lowest variance but the ten pixels with the lowest variance, most of them are clustered around the same spot. (example code below).

```sorti = M2.argsort(axis=None)
print(sep="\n",*[(i,M2[i]) for i in [np.unravel_index(i,M2.shape) for i in sorti[:10]]])
```