Fun with Filters and Frequencies

Part I - Fun with Filters

Convolutions from Scratch

I first implemented the convolution operation from scratch just using numpy, with a box filter of size 9x9. I implemented two versions of the convolution function: one with four nested loops and one with two nested loops. The four-loop version iterates over each pixel in the output image and then iterates over each element in the kernel to compute the convolution sum. The two-loop version iterates over each pixel in the output image and then uses numpy's element-wise multiplication and sum functions to compute the convolution sum. I compared the results of my implementations with SciPy's built-in convolve2d function to verify the correctness of my implementations. For all convolved results, the images look more blurred than the original image since the box filter averages the pixel values in the neighborhood of each pixel, as shown below.

Original Photo

Convolved with 4 loop

Convolved with 2 loop

Convolved with SciPy

The convolved images look more blurry than the original image.

Here is the code for both convolution implementations:

            
def conv2d_four_loops(image, kernel):
    image_height, image_width = image.shape
    kernel_height, kernel_width = kernel.shape
    pad_height = kernel_height // 2
    pad_width = kernel_width // 2
    
    padded_image = np.pad(image, ((pad_height, pad_height), (pad_width, pad_width)), mode="constant", constant_values=0)
    convolved_image = np.zeros_like(image)
    
    for y in range(image_height):
        for x in range(image_width):
            for i in range(kernel_height):
                for j in range(kernel_width):
                    convolved_image[y, x] += padded_image[y + i, x + j] * kernel[i, j]

    return convolved_image

def conv2d_two_loops(image, kernel):
    image_height, image_width = image.shape
    kernel_height, kernel_width = kernel.shape
    pad_height = kernel_height // 2
    pad_width = kernel_width // 2
    
    padded_image = np.pad(image, ((pad_height, pad_height), (pad_width, pad_width)), mode="constant", constant_values=0)
    convolved_image = np.zeros_like(image)
    
    for y in range(image_height):
        for x in range(image_width):
            region = padded_image[y:y + kernel_height, x:x + kernel_width]
            convolved_image[y, x] = np.sum(region * kernel)

    return convolved_image

image = load_image_grayscale("Me.jpg")
box_filter = np.ones((9, 9)) / 81

convolved_four_loops = conv2d_four_loops(image, box_filter)
convolved_two_loops = conv2d_two_loops(image, box_filter)
convolved_scipy = convolve2d(image, box_filter, mode="same", boundary="fill", fillvalue=0)

Finite Difference Operator

In this part, I explore how convolution kernel filters can be used to detect edges in an image. Knowing that an image is represented as a 2D array, it could be said that edges of an image are areas where the intensity of the pixels change rapidly. To detect these edges, we can use a filter that is sensitive to changes in intensity. One of the most simple filters that achieves this is the filter \( \mathbf{D_x} = \left[ 1 \,\,\,0\,\, -1 \right]\). This filter, when convolved with an image, will highlight the vertical edges in the image. Similarly, the filter \( \mathbf{D_y} = \left[ 1 \,\,\,0\,\, -1 \right]^T \) will highlight the horizontal edges in the image. Then, the gradient magnitude is computed by the following formula.

\(\text{m} = \sqrt{(\text{edges_x})^2 + (\text{edges_y})^2}\)

Finding the gradient magnitude is useful since it provides a single value that represents the strength of the edges in the image, regardless of their direction. Then, the gradient magnitude image is binarized by setting a threshold to further remove noise. Here are the results:

Cameraman

Cameraman convolved with \(\mathbf{D_x}\)

Camerman convolved with \(\mathbf{D_y}\)

Cameraman image convolved with \( \mathbf{D_x} \) and \( \mathbf{D_y} \) filters.
Observe how vertical edges are highlighted with the \( \mathbf{D_x} \) image (e.g. the vertical rod in the tripod)
and horizontal edges in the \( \mathbf{D_y} \) image (e.g. the horizon and buildings in the background).

Cameraman (Gradient magnitude)

Cameraman (Gradient magnitude)
Binarized with threshold 0.23.

Gradient magnitude image generated by square root of sum of squares of \( \mathbf{D_x} \) and \( \mathbf{D_y} \) images.
The image on the right has been binarized with threshold of 0.23 which highlights the edges in the image while removing noise.

Although the binarization threshold of 0.23 works well for the Cameraman image, it still struggles to remove noise in the image while keeping valid edges. To address this, Gaussian blur can be applied to the image to remove noise before applying the gradient magnitude filter. Here are the results:

Blurred Cameraman

Blurred Cameraman
(Gradient magnitude)

Blurred Cameraman
(Gradient magnitude)
Binarized with threshold 0.03.

Here, the Cameraman image has been blurred with a Gaussian filter of size 15 and standard deviation 3 to remove noise.
Notice how the binarized images successfully highlight the edges in the image while removing noise.

Derivative of Gaussian Filter

The process of blurring an image with a Gaussian filter and then taking the derivative of the blurred image can also be performed by a single convolution operation with the derivative of Gaussian filter. This can improve the efficiency of the process by reducing the number of convolution operations on original image. I verify that this process is equivalent to running Gaussian blur and then taking the gradient magnitude of the blurred image.

Combined Filter
(x-direction)
\(k_{size} = 15\), \(\sigma = 3\)

Blurred Cameraman
(Gradient magnitude)
(Single convolution)

Blurred Cameraman (Binarized, One-convolution)

Blurred Cameraman
(Gradient magnitude, binarized)
(Single convolution)

The images here are obtained by a single convolution operation with the derivative of Gaussian filter.
These images are equivalent to running Gaussian blur and then taking the gradient magnitude of the blurred image.

Part II - Fun with Frequencies

Image Sharpening

Here, we explore the technique of unsharp masking used to sharpen an image. The technique works by creating a blurred version of the image with Gaussian blur and then subtracting the blurred image from the original image. Since the blurred image removed the low-frequency components of the image, the subtraction results in an image that contains high frequencies of the original image. The high frequencies are then added back to the original image to further sharpen it. Mathematically, it can be represented as the equation below, where \( f \) is the original image, \( g \) is the Gaussian kernel (thus \( f * g \) is the blurred image), and \( \alpha \) is a constant that controls the amount of sharpening.

\( f + \alpha(f - f * g) = (1 + \alpha)f - \alpha f * g = f * \left( (1 + \alpha)e - \alpha g \right) \)

The formula \( f + \alpha(f - f * g) \) can also be rearranged to \( f * \left( (1 + \alpha)e - \alpha g \right) \), where \( e \) is the identity kernel. This shows that the sharpening process can be applied as a single convolution operation with the kernel \( (1 + \alpha)e - \alpha g \). I have also noticed that running convolution reduces the size of the image, so I have reflected the image around the edge to maintain the original image size. Here are the final results:

Taj Mahal (Original)

Taj Mahal (Low-pass)

Taj Mahal (High-pass)

Taj Mahal (\(\alpha = 0.5\))

Taj Mahal (\(\alpha = 1\))

Taj Mahal (\(\alpha = 4\))

Taj Mahal (\(\alpha = 20\))

Sharpened image of Taj Mahal.

Sharpened image of a locked wheel. Notice how the spokes of the wheel get sharpened.

Sharpened image of a chalkboard. Higher \(\alpha\) values make faint chalk marks more prominent.

I also used my sharpening implementation to sharpen a blurry image of a pizza.

Blurry Pizza (Original)

Sharpened Pizza (\(\alpha = 1.5\))

Sharpened image of a blurry pizza. The sharpening process helps to recover some details in the image.

Finally, I took a photo, blurred it, and re-sharpened the blurred image to see how well the sharpening process can recover the original image. As shown below, the sharpening process is able to make the edges more prominent, but it is not able to fully recover the original image. This is because the blurring process removes some high-frequency information from the image, and the sharpening process can only enhance the high-frequency information that is still present after blurring.

Street (Original)

Blurry Street

Sharpened Blurry Street (\(\alpha = 2.0\))

Hybrid Images

Hybrid images are images that look like one image up close and another image from a distance. This effect is achieved by combining the high-frequency components of one image with the low-frequency components of another image. The low-frequency component is obtained by running a Gaussian blur filter on the original image, and the high-frequency component is obtained by subtracting the low-pass filtered image from the original image. Then, the low-pass image of one image is added to the high-pass image of another image to create the hybrid image. I have also enabled color for hybrid images, and since the low-pass image dominates the color, it adds more vibrancy to the image when it is viewed afar. However, in cases like Derek and Nutmeg hybrid image, Derek dominating the color can make Nutmeg less noticeable. The results of the hybrid images are shown below.

Derek

Nutmeg

Derek Nutmeg Hybrid (Gray)

Derek Nutmeg Hybrid (Color)

Hybrid image of a Derek and his cat Nutmeg. \(\sigma_{derek}=5.8\), \(\sigma_{owl}=4.6\), \(k_{size}=17\).
The image looks like Nutmeg up close. The combined image is cropped and rotated to remove black borders.

Frog

Owl

Frog Owl Hybrid (Gray)

Frog Owl Hybrid (Color)

Hybrid image of a black rain frog and an owl. \(\sigma_{frog}=2.4\), \(\sigma_{owl}=3.6\), \(k_{size}=11\).
The image looks like an owl up close. The combined image is cropped and rotated to remove black borders.

Frequency Analysis of Frog Owl Hybrid Image

Frog FFT

Frog Filtered FFT

Owl FFT

Owl Filtered FFT

Frog Owl Hybrid FFT

Frequency analysis of the frog, owl, and hybrid images.
The hybrid image has the low-frequency components of the frog and high-frequency components of the owl.

Failure example:

Earth

Marble

Earth Marble Hybrid (Gray)

Earth Marble Hybrid (Color)

Hybrid image of Earth and a marble. \(\sigma_{earth}=2\), \(\sigma_{marble}=10\), \(k_{size}= 40 \).
Although the marble texture is only visible up close, it still looks like Earth.

Multiresolution Blending

In this part, I explore the technique of multiresolution blending to combine two images together. This technique works by creating a Laplacian stack for each image and then blending the stacks together. Specifically, I take the two images and create Laplacian stacks at each level. I also create a stack of Gaussian filter by running low-pass Gaussian filtering on the black-white filter image. These two stacks are then added up by the following formula:

\( I = SG_k + \sum_{i=1}^{k} l_k^A \cdot m_k + l_k^B \cdot \left(1 - m_k\right) \)

, where \(l_k^A\) is the Laplacian stack layer \(k\) of image \(A\), and \(m_k\) is the \(k\)th layer of Gaussian low-pass filter stack. Finally, the base Gaussian split image, \(SG_k\), obtained by running Gaussian low-pass filter \(k\) times on image combined with the black-white filter, is added to the final Laplacian stack sum \(L\) to obtain the final image. The results on Oraple and other images are shown below.