I first implemented the convolution operation from scratch just using numpy, with a box filter
of size 9x9. I implemented two versions of the convolution function: one with four nested loops
and one with two nested loops. The four-loop version iterates over each pixel in the output image
and then iterates over each element in the kernel to compute the convolution sum. The two-loop version
iterates over each pixel in the output image and then uses numpy's element-wise multiplication and
sum functions to compute the convolution sum. I compared the results of my implementations with
SciPy's built-in convolve2d function to verify the correctness of my implementations.
For all convolved results, the images look more blurred than the original image since the box filter
averages the pixel values in the neighborhood of each pixel, as shown below.
Original Photo
Convolved with 4 loop
Convolved with 2 loop
Convolved with SciPy
Here is the code for both convolution implementations:
def conv2d_four_loops(image, kernel):
image_height, image_width = image.shape
kernel_height, kernel_width = kernel.shape
pad_height = kernel_height // 2
pad_width = kernel_width // 2
padded_image = np.pad(image, ((pad_height, pad_height), (pad_width, pad_width)), mode="constant", constant_values=0)
convolved_image = np.zeros_like(image)
for y in range(image_height):
for x in range(image_width):
for i in range(kernel_height):
for j in range(kernel_width):
convolved_image[y, x] += padded_image[y + i, x + j] * kernel[i, j]
return convolved_image
def conv2d_two_loops(image, kernel):
image_height, image_width = image.shape
kernel_height, kernel_width = kernel.shape
pad_height = kernel_height // 2
pad_width = kernel_width // 2
padded_image = np.pad(image, ((pad_height, pad_height), (pad_width, pad_width)), mode="constant", constant_values=0)
convolved_image = np.zeros_like(image)
for y in range(image_height):
for x in range(image_width):
region = padded_image[y:y + kernel_height, x:x + kernel_width]
convolved_image[y, x] = np.sum(region * kernel)
return convolved_image
image = load_image_grayscale("Me.jpg")
box_filter = np.ones((9, 9)) / 81
convolved_four_loops = conv2d_four_loops(image, box_filter)
convolved_two_loops = conv2d_two_loops(image, box_filter)
convolved_scipy = convolve2d(image, box_filter, mode="same", boundary="fill", fillvalue=0)
In this part, I explore how convolution kernel filters can be used to detect edges in an image. Knowing that an image is represented as a 2D array, it could be said that edges of an image are areas where the intensity of the pixels change rapidly. To detect these edges, we can use a filter that is sensitive to changes in intensity. One of the most simple filters that achieves this is the filter \( \mathbf{D_x} = \left[ 1 \,\,\,0\,\, -1 \right]\). This filter, when convolved with an image, will highlight the vertical edges in the image. Similarly, the filter \( \mathbf{D_y} = \left[ 1 \,\,\,0\,\, -1 \right]^T \) will highlight the horizontal edges in the image. Then, the gradient magnitude is computed by the following formula.
Finding the gradient magnitude is useful since it provides a single value that represents the strength of the edges in the image, regardless of their direction. Then, the gradient magnitude image is binarized by setting a threshold to further remove noise. Here are the results:
Cameraman
Cameraman convolved with \(\mathbf{D_x}\)
Camerman convolved with \(\mathbf{D_y}\)
Cameraman (Gradient magnitude)
Cameraman (Gradient magnitude)
Binarized with threshold 0.23.
Although the binarization threshold of 0.23 works well for the Cameraman image, it still struggles to remove noise in the image while keeping valid edges. To address this, Gaussian blur can be applied to the image to remove noise before applying the gradient magnitude filter. Here are the results:
Blurred Cameraman
Blurred Cameraman
(Gradient magnitude)
Blurred Cameraman
(Gradient magnitude)
Binarized with threshold 0.03.
The process of blurring an image with a Gaussian filter and then taking the derivative of the blurred image can also be performed by a single convolution operation with the derivative of Gaussian filter. This can improve the efficiency of the process by reducing the number of convolution operations on original image. I verify that this process is equivalent to running Gaussian blur and then taking the gradient magnitude of the blurred image.
Combined Filter
(x-direction)
\(k_{size} = 15\), \(\sigma = 3\)
Blurred Cameraman
(Gradient magnitude)
(Single convolution)
Blurred Cameraman
(Gradient magnitude, binarized)
(Single convolution)
Here, we explore the technique of unsharp masking used to sharpen an image. The technique works by creating a blurred version of the image with Gaussian blur and then subtracting the blurred image from the original image. Since the blurred image removed the low-frequency components of the image, the subtraction results in an image that contains high frequencies of the original image. The high frequencies are then added back to the original image to further sharpen it. Mathematically, it can be represented as the equation below, where \( f \) is the original image, \( g \) is the Gaussian kernel (thus \( f * g \) is the blurred image), and \( \alpha \) is a constant that controls the amount of sharpening.
The formula \( f + \alpha(f - f * g) \) can also be rearranged to \( f * \left( (1 + \alpha)e - \alpha g \right) \), where \( e \) is the identity kernel. This shows that the sharpening process can be applied as a single convolution operation with the kernel \( (1 + \alpha)e - \alpha g \). I have also noticed that running convolution reduces the size of the image, so I have reflected the image around the edge to maintain the original image size. Here are the final results:
Taj Mahal (Original)
Taj Mahal (Low-pass)
Taj Mahal (High-pass)
Taj Mahal (\(\alpha = 0.5\))
Taj Mahal (\(\alpha = 1\))
Taj Mahal (\(\alpha = 4\))
Taj Mahal (\(\alpha = 20\))
I also used my sharpening implementation to sharpen a blurry image of a pizza.
Blurry Pizza (Original)
Sharpened Pizza (\(\alpha = 1.5\))
Finally, I took a photo, blurred it, and re-sharpened the blurred image to see how well the sharpening process can recover the original image. As shown below, the sharpening process is able to make the edges more prominent, but it is not able to fully recover the original image. This is because the blurring process removes some high-frequency information from the image, and the sharpening process can only enhance the high-frequency information that is still present after blurring.
Street (Original)
Blurry Street
Sharpened Blurry Street (\(\alpha = 2.0\))
Hybrid images are images that look like one image up close and another image from a distance. This effect is achieved by combining the high-frequency components of one image with the low-frequency components of another image. The low-frequency component is obtained by running a Gaussian blur filter on the original image, and the high-frequency component is obtained by subtracting the low-pass filtered image from the original image. Then, the low-pass image of one image is added to the high-pass image of another image to create the hybrid image. I have also enabled color for hybrid images, and since the low-pass image dominates the color, it adds more vibrancy to the image when it is viewed afar. However, in cases like Derek and Nutmeg hybrid image, Derek dominating the color can make Nutmeg less noticeable. The results of the hybrid images are shown below.
Derek
Nutmeg
Derek Nutmeg Hybrid (Gray)
Derek Nutmeg Hybrid (Color)
Frog
Owl
Frog Owl Hybrid (Gray)
Frog Owl Hybrid (Color)
Frog FFT
Frog Filtered FFT
Owl FFT
Owl Filtered FFT
Frog Owl Hybrid FFT
Earth
Marble
Earth Marble Hybrid (Gray)
Earth Marble Hybrid (Color)
In this part, I explore the technique of multiresolution blending to combine two images together. This technique works by creating a Laplacian stack for each image and then blending the stacks together. Specifically, I take the two images and create Laplacian stacks at each level. I also create a stack of Gaussian filter by running low-pass Gaussian filtering on the black-white filter image. These two stacks are then added up by the following formula:
, where \(l_k^A\) is the Laplacian stack layer \(k\) of image \(A\), and \(m_k\) is the \(k\)th layer of Gaussian low-pass filter stack. Finally, the base Gaussian split image, \(SG_k\), obtained by running Gaussian low-pass filter \(k\) times on image combined with the black-white filter, is added to the final Laplacian stack sum \(L\) to obtain the final image. The results on Oraple and other images are shown below.
Oraple Gaussian #1
Oraple Gaussian #2
Oraple Gaussian #3
Oraple Gaussian #4
Oraple Gaussian #5
Oraple Laplacian #1
Oraple Laplacian #2
Oraple Laplacian #3
Oraple Laplacian #4
Oraple Laplacian #5
Apple
Orange
Oraple
Bowl 1
Bowl 2
Bowl Mask
Bowl (Merged)
Car 1
Car 2
Car Mask
Car (Merged)
Sofa
Building with Eyes
Sofa Mask
Sofa (Merged)
Tower 1
Tower 2?
Tower Mask
Tower (Merged)