I start by splitting the image into 3 sections, one for each channel. I also cropped the image to 90% of its original size to get rid of any borders. Then, I took a pair of channels (e.g. red and blue), and tried to find the best positioning that would minimize the Normalized Cross-Correlation Loss, which was the negative of the normalized matrices:
This approach works fine for relatively small images (300-400px), but does not scale well for larger images. If we maintain the same search window for a larger image, we may not cover enough of a search space to accurately recover the correct positioning of the layer. It is also infeasible to scale the search window with the image size, since the runtime grows quadratically with the sides of the image (If we scale the image by a factor of 2, we would have to scale up our search window by a factor of 4).
As such, I resorted to using an image pyramid to reduce the search space dramatically. I initially scaled down the image to around 500px along its longest side, and calculated the optimal shifts for each channel in a window of ±15px in each direction. Then, I would scale up the image by a factor of 2 and pre-shift each of the layers by twice the originally calculated displacements. From here, I could again run the same algorithm with a significantly smaller search window (±3px), since I knew it was already close to an optimal alignment. From there, I would repeat this process until reaching the original resolution. This allows us to vastly expand our search window and decrease our runtime, since after the first iteration, we only need to check a fraction of the possible displacements.
Although this approach allowed us to process larger images, there were still issues with certain images. For example, with the Emir, various features of red and blue channels differ too greatly: The emir's clothes have a high amount of blue but a low amount of red, while the doors and floor have more red with less blue. As such, simply using the raw channel values is not enough.
To combat the problem, instead of matching raw values, I decided to match edges of channels. This helps solve the problem with the Emir image, since each channel could have different intensities for the same part of the image (e.g. the Emir photo has lots of red and no blue in the background, but little red and lots of blue in the subject of the image).
In order to detect edges, I used Sobel edge detection. The process starts by applying a gaussian blur to the image to smooth out the noise in the image. Then, it takes the blurred channel and convolves it with two 3x3 kernels to detect horizontal and vertical edges (
where
Standard RGB | My Sobel Edge Detection | OpenCV's Edge Detection |
---|---|---|
I utilized np.roll
, which rolled the image over, so I wouldn't have to deal with padding the channels while shifting them around. I initially tried to just trim each layer, but the issue with having to retrim the third layer was too much of a headache to solve. ↩︎