The way we represent an image mathematically can have a big impact on our ability to mathematically manipulate it. Conceptually it would be simplest to represent an image (let’s assume it’s grey-level) as a 2D array. If my image is a 2D array \(\mathbf{X}\) I could implement the effect a linear shift invariant blurring function \(\mathbf{H}\) and produce an output image \(\mathbf{F}\) via the convolution operator:
\(\mathbf{F}=\mathbf{H} \ast \mathbf{X}\)
I could do other things with this notation such as introduce a shift operator to move my image by one pixel
\(\mathbf{F}=\mathbf{\acute {H}} \ast \mathbf{X}\)
where \(\mathbf{\acute {H}}=[0, 0 ,1]\).
The problem is this is all shift invariant, the same blur or shift is applied to all the pixels in an image. What if the amount of blurring and shifting changes from pixel to pixel as it does in a real image due to imperfections in the camera’s lens? I would need a separate \(\mathbf{\acute {H}}=[0, 0 ,1]\) for every pixel. A more convenient way is to drop the 2D convolution and implement our system using matrix multiplications. To do this we lexicography rearrange the 2D image matrix into a 1D vector
\(\left[ \begin{array}{ccc}
a & b & c \\
d & e & f \\
g & h & i \end{array}\right] \longrightarrow \left[ \begin{array}{c} a\\d\\g\\b\\e\\h\\c\\f\\i \end{array} \right]\)
In Matlab this would be implemented with X1d=X(:)
and we can transform it back to 2d with knowledge of the original number of rows and columns X=reshape(X1d,rows,cols)
.
For simplicity sake I shall reduce the number pixels in my image to 3. But what can we do with this? Well let’s look at a matrix multiply operation
\(\left[ \begin{array}{ccc}
a & b & c \\
d & e & f \\
g & h & i \end{array}\right] \left[ \begin{array}{c}x\\y\\z\end{array}\right] = \left[ \begin{array}{c} ax+by+cz\\dx+ey+fz\\gx+hy+iz\end{array}\right]\)
each row in the matrix is like an operator on each pixel. I’ve effectively got a shift variant convolution. For example I could blur the first pixel and leave the rest the same
\(\left[ \begin{array}{ccc}
0.33& 0.33 & 0.33 \\
0 & 1 & 0 \\
0 & 0 & 1 \end{array}\right] \left[ \begin{array}{c}x\\y\\z\end{array}\right] = \left[ \begin{array}{c} 0.33x+0.33y+0.33z\\y\\z\end{array}\right]\)
Note that the 1s go down the diagonal.
I could implement a shift on the second pixel
\(\left[ \begin{array}{ccc}
1& 0 & 0 \\
0 & 0& 1 \\
0 & 0 & 1 \end{array}\right] \left[ \begin{array}{c}x\\y\\z\end{array}\right] = \left[ \begin{array}{c} x\\z\\z\end{array}\right]\)
by changing the values I could implement rotations and warps.
And we can combine several matrices together to define our system. If \(\mathbf{S}\) is a shift matrix and \(\mathbf{B}\) is a blurring matrix with can simply combine the results together
\(\mathbf{F}=\mathbf{SBX}\)
to describe our shift variant optical system.
An additional step we may wish to introduce the effect of sensor pixel size. We can implement this by making our original image have a much higher resolution and them use a decimation filter to reduce this to a low resolution camera image. To this we create a matrix with \(N\) rows, which equals the number of pixels in the decimated image, and \(M\) columns, which equals the number of pixels in the high resolution image.
\(\left[ \begin{array}{ccc}
0.5& 0.5 & 0 & 0 \\
0 & 0 & 0.5 & 0.5 \end{array}\right] \left[ \begin{array}{c}w\\x\\y\\z\end{array}\right] = \left[ \begin{array}{c} 0.5w+0.5x \\0.5y +0.5z\end{array}\right]\)
shows how we can reduce the resolution by 1/2 in one dimension and we can easily extend this to 2D.
Update
It’s worth noting if the blurring is shift invariant (which is a lot easier to deal with) the matrix is block circulant. This means it is of the form
\(\left[ \begin{array}{ccccc}
d(0) & d(M-1) & d(M-2)& \ldots &d(1) \\
d(1) & d(0) & d(M-1)& \ldots &d(2)\\
d(2) & d(1) & d(0)& \ldots &d(3)\\
\vdots&\vdots&\vdots&\vdots&\vdots&\\
d(M-1) & d(M-2) & d(M-3)& \ldots &d(0) \end{array}\right]\)
Note, each row is a shifted version of the one above it. The reason, this is important is that it is easy to invert.