CS 180 Project 1

Approach 1 - Naive

I start by splitting the image into 3 sections, one for each channel. I also cropped the image to 90% of its original size to get rid of any borders. Then, I took a pair of channels (e.g. red and blue), and tried to find the best positioning that would minimize the Normalized Cross-Correlation Loss, which was the negative of the normalized matrices: where we use the Frobenius Norm for each layer. To find the best position, I shifted[1] the first layer in the x and y direction in the range of -15px to 15px, and found the positioning that had the minimal loss. I did this for 2 different pairs (red-blue & green-blue), and moved the layers accordingly. Finally, I trimmed off any area that did not have all 3 layers on it.

This approach works fine for relatively small images (300-400px), but does not scale well for larger images. If we maintain the same search window for a larger image, we may not cover enough of a search space to accurately recover the correct positioning of the layer. It is also infeasible to scale the search window with the image size, since the runtime grows quadratically with the sides of the image (If we scale the image by a factor of 2, we would have to scale up our search window by a factor of 4).

Approach 2 - Image Pyramid

As such, I resorted to using an image pyramid to reduce the search space dramatically. I initially scaled down the image to around 500px along its longest side, and calculated the optimal shifts for each channel in a window of ±15px in each direction. Then, I would scale up the image by a factor of 2 and pre-shift each of the layers by twice the originally calculated displacements. From here, I could again run the same algorithm with a significantly smaller search window (±3px), since I knew it was already close to an optimal alignment. From there, I would repeat this process until reaching the original resolution. This allows us to vastly expand our search window and decrease our runtime, since after the first iteration, we only need to check a fraction of the possible displacements.

Although this approach allowed us to process larger images, there were still issues with certain images. For example, with the Emir, various features of red and blue channels differ too greatly: The emir's clothes have a high amount of blue but a low amount of red, while the doors and floor have more red with less blue. As such, simply using the raw channel values is not enough.

Pasted image 20240910004517.png

Approach 3 - Edge Detection (Bells and Whistles)

To combat the problem, instead of matching raw values, I decided to match edges of channels. This helps solve the problem with the Emir image, since each channel could have different intensities for the same part of the image (e.g. the Emir photo has lots of red and no blue in the background, but little red and lots of blue in the subject of the image).

In order to detect edges, I used Sobel edge detection. The process starts by applying a gaussian blur to the image to smooth out the noise in the image. Then, it takes the blurred channel and convolves it with two 3x3 kernels to detect horizontal and vertical edges ( and ). More specifically, if is the channel, we have

where is the normed 2D convolution operator. From here, we take the can approximate the magnitude of the gradient by From here, I applied this transformation to each layer of our image pyramid and compared the losses of the edges and got significantly better results. I implemented this process on my own initially, but opted to use OpenCV's implementation instead since it was faster and provided more accurate results.

Standard RGB My Sobel Edge Detection OpenCV's Edge Detection
Pasted image 20240910210435.png Pasted image 20240910210644.png Pasted image 20240910210714.png
Pasted image 20240910211209.png Pasted image 20240910211214.png Pasted image 20240910211221.png

  1. I utilized np.roll, which rolled the image over, so I wouldn't have to deal with padding the channels while shifting them around. I initially tried to just trim each layer, but the issue with having to retrim the third layer was too much of a headache to solve. ↩︎