CS 180 Project 5A

The first part of project 5 is about using pretrained diffusion models to generate images from text prompts. I set up the environment as provided in the project instructions, and used a random seed 3036155160 for reproducibility. Here are some sample images generated using the DeepFloyd IF diffusion model with different numbers of inference steps:

A graphic of yellow whale (20 steps)

A graphic of blue starfish (20 steps)

A cheap backpack (20 steps)

Sample images generated from text prompts using DeepFloyd IF diffusion model with 20 inference steps.

A graphic of yellow whale (120 steps)

A graphic of blue starfish (120 steps)

A cheap backpack (120 steps)

Sample images generated from text prompts using DeepFloyd IF diffusion model with 120 inference steps.

Part A1. Sampling Loops

In this part of the problem set, I wrote my own "sampling loops" that use the pretrained DeepFloyd denoisers. These produce high quality images such as the ones generated above. I then modified these sampling loops to solve different tasks such as inpainting or producing optical illusions.

1.1. Implementing the Forward Process

A key part of diffusion is the forward process, which takes a clean image and adds noise to it. In this part, I wrote a function to implement this. The forward process is defined by:

Campanile

Noisy Campanile at \(t=250\)

Noisy Campanile at \(t=500\)

Noisy Campanile at \(t=750\)

1.2. Classical Denoising

One simple way to denoise an image is to use Gaussian blur filtering. Here are some results:

Gaussian Blur at \(t=250\)

Gaussian Blur at \(t=500\)

Gaussian Blur at \(t=750\)

1.3. One-Step Denoising

Using a pretrained diffusion model, we can denoise images in one step. Here are the results:

One-Step Denoising at \(t=250\)

One-Step Denoising at \(t=500\)

One-Step Denoising at \(t=750\)

1.4. Iterative Denoising

By iteratively denoising an image, we can achieve better results. Here are the results:

Iterative Denoising at \(t=660\)

Iterative Denoising at \(t=510\)

Iterative Denoising at \(t=360\)

Iterative Denoising at \(t=210\)

Iterative Denoising at \(t=60\)

Campanile

Iteratively Denoised Campanile

One-Step Denoised Campanile

Gaussian Blur at \(t=750\)

The iteratively denoised Campanile resembles the original image much more closely than the one-step denoised version.

1.5. Diffusion Model Sampling

By starting from pure noise and iteratively denoising, we can generate new images. Here are some samples:

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

1.6. Classifier-Free Guidance (CFG)

Using Classifier-Free Guidance, we can improve the quality of generated images. Here are some samples:

CFG Sample 1

CFG Sample 2

CFG Sample 3

CFG Sample 4

CFG Sample 5

1.7. Image-to-Image Translation

By adding noise to an image and then denoising it with a text prompt, we can create interesting edits. Here are some examples:

Campanile with \(i_{start}=1\)

Campanile with \(i_{start}=3\)

Campanile with \(i_{start}=5\)

Campanile with \(i_{start}=7\)

Campanile with \(i_{start}=10\)

Campanile with \(i_{start}=20\)

Campanile

Backpack with \(i_{start}=1\)

Backpack with \(i_{start}=3\)

Backpack with \(i_{start}=5\)

Backpack with \(i_{start}=7\)

Backpack with \(i_{start}=10\)

Backpack with \(i_{start}=20\)

Backpack

Character with \(i_{start}=1\)

Character with \(i_{start}=3\)

Character with \(i_{start}=5\)

Character with \(i_{start}=7\)

Character with \(i_{start}=10\)

Character with \(i_{start}=20\)

Character

I also repeated the same procedure on images from the web and hand-drawn images.

Santa with \(i_{start}=1\)

Santa with \(i_{start}=3\)

Santa with \(i_{start}=5\)

Santa with \(i_{start}=7\)

Santa with \(i_{start}=10\)

Santa with \(i_{start}=20\)

Santa

Painting with \(i_{start}=1\)

Painting with \(i_{start}=3\)

Painting with \(i_{start}=5\)

Painting with \(i_{start}=7\)

Painting with \(i_{start}=10\)

Painting with \(i_{start}=20\)

Painting

Camera with \(i_{start}=1\)

Camera with \(i_{start}=3\)

Camera with \(i_{start}=5\)

Camera with \(i_{start}=7\)

Camera with \(i_{start}=10\)

Camera with \(i_{start}=20\)

Camera

1.7.2. Inpainting

By using a mask to specify regions to edit, we can inpaint images. Here are some examples:

Campanile

Campanile Mask

Campanile Inpainted

Coffee

Coffee Mask

Coffee Inpainted

Emoji

Emoji Mask

Emoji Inpainted

1.7.3. Text-Conditional Image-to-Image Translation

By guiding the denoising process with text prompts, we can create edits that align with the desired description. In this part, I used the text embedding "a childish drawing" to guide the edits. Here are some examples:

Childish Campanile with \(i_{start}=1\)

Childish Campanile with \(i_{start}=3\)

Childish Campanile with \(i_{start}=5\)

Childish Campanile with \(i_{start}=7\)

Childish Campanile with \(i_{start}=10\)

Childish Campanile with \(i_{start}=20\)

Campanile

Childish Character with \(i_{start}=1\)

Childish Character with \(i_{start}=3\)

Childish Character with \(i_{start}=5\)

Childish Character with \(i_{start}=7\)

Childish Character with \(i_{start}=10\)

Childish Character with \(i_{start}=20\)

Character

Childish Emoji with \(i_{start}=1\)

Childish Emoji with \(i_{start}=3\)

Childish Emoji with \(i_{start}=5\)

Childish Emoji with \(i_{start}=7\)

Childish Emoji with \(i_{start}=10\)

Childish Emoji with \(i_{start}=20\)

Emoji

1.8. Visual Anagrams

In this part, I created visual anagrams where the image appears as one thing when viewed normally, and another when flipped upside down. Here are some examples:

An oil painting of an old man

An oil painting of people around a campfire

A bowl of noodles

A stadium

An oil painting of a snowy mountain village

A photo of a hipster barista

1.9. Hybrid Images

In this part, I created hybrid images that change appearance based on viewing distance. This uses a similar technique to visual anagrams, but blends high and low frequency components. Here are some examples:

Skull and waterfall

Stadium and noodles

Fun With Diffusion Models!

Part A0. Setup

Part A1. Sampling Loops

1.1. Implementing the Forward Process

1.2. Classical Denoising

1.3. One-Step Denoising

1.4. Iterative Denoising

1.5. Diffusion Model Sampling

1.6. Classifier-Free Guidance (CFG)

1.7. Image-to-Image Translation

1.7.2. Inpainting

1.7.3. Text-Conditional Image-to-Image Translation

1.8. Visual Anagrams

1.9. Hybrid Images