Fun With Diffusion Models!

Part A0. Setup

The first part of project 5 is about using pretrained diffusion models to generate images from text prompts. I set up the environment as provided in the project instructions, and used a random seed 3036155160 for reproducibility. Here are some sample images generated using the DeepFloyd IF diffusion model with different numbers of inference steps:

Whale (20)

A graphic of yellow whale (20 steps)

Starfish (20)

A graphic of blue starfish (20 steps)

Backpack (20)

A cheap backpack (20 steps)

Sample images generated from text prompts using DeepFloyd IF diffusion model with 20 inference steps.
Whale (120)

A graphic of yellow whale (120 steps)

Starfish (120)

A graphic of blue starfish (120 steps)

Backpack (120)

A cheap backpack (120 steps)

Sample images generated from text prompts using DeepFloyd IF diffusion model with 120 inference steps.

Part A1. Sampling Loops

In this part of the problem set, I wrote my own "sampling loops" that use the pretrained DeepFloyd denoisers. These produce high quality images such as the ones generated above. I then modified these sampling loops to solve different tasks such as inpainting or producing optical illusions.

1.1. Implementing the Forward Process

A key part of diffusion is the forward process, which takes a clean image and adds noise to it. In this part, I wrote a function to implement this. The forward process is defined by:

Campanile

Campanile

Campanile (250)

Noisy Campanile at \(t=250\)

Campanile (500)

Noisy Campanile at \(t=500\)

Campanile (750)

Noisy Campanile at \(t=750\)

1.2. Classical Denoising

One simple way to denoise an image is to use Gaussian blur filtering. Here are some results:

Campanile Gaussian (250)

Gaussian Blur at \(t=250\)

Campanile Gaussian (500)

Gaussian Blur at \(t=500\)

Campanile Gaussian (750)

Gaussian Blur at \(t=750\)

1.3. One-Step Denoising

Using a pretrained diffusion model, we can denoise images in one step. Here are the results:

Campanile One-Step Denoise (250)

One-Step Denoising at \(t=250\)

Campanile One-Step Denoise (500)

One-Step Denoising at \(t=500\)

Campanile One-Step Denoise (750)

One-Step Denoising at \(t=750\)

1.4. Iterative Denoising

By iteratively denoising an image, we can achieve better results. Here are the results:

Campanile Iterative Denoise (t=660)

Iterative Denoising at \(t=660\)

Campanile Iterative Denoise (t=510)

Iterative Denoising at \(t=510\)

Campanile Iterative Denoise (t=360)

Iterative Denoising at \(t=360\)

Campanile Iterative Denoise (t=210)

Iterative Denoising at \(t=210\)

Campanile Iterative Denoise (t=60)

Iterative Denoising at \(t=60\)

Campanile

Campanile

Campanile Iterative Denoise (Final)

Iteratively Denoised Campanile

Campanile One-Step Denoise (Final)

One-Step Denoised Campanile

Campanile Gaussian (750)

Gaussian Blur at \(t=750\)

The iteratively denoised Campanile resembles the original image much more closely than the one-step denoised version.

1.5. Diffusion Model Sampling

By starting from pure noise and iteratively denoising, we can generate new images. Here are some samples:

Sample 1

Sample 1

Sample 2

Sample 2

Sample 3

Sample 3

Sample 4

Sample 4

Sample 5

Sample 5

1.6. Classifier-Free Guidance (CFG)

Using Classifier-Free Guidance, we can improve the quality of generated images. Here are some samples:

Sample 1

CFG Sample 1

Sample 2

CFG Sample 2

Sample 3

CFG Sample 3

Sample 4

CFG Sample 4

Sample 5

CFG Sample 5

1.7. Image-to-Image Translation

By adding noise to an image and then denoising it with a text prompt, we can create interesting edits. Here are some examples:

Campanile i_start=1

Campanile with \(i_{start}=1\)

Campanile i_start=3

Campanile with \(i_{start}=3\)

Campanile i_start=5

Campanile with \(i_{start}=5\)

Campanile i_start=7

Campanile with \(i_{start}=7\)

Campanile i_start=10

Campanile with \(i_{start}=10\)

Campanile i_start=20

Campanile with \(i_{start}=20\)

Campanile

Campanile

Backpack i_start=1

Backpack with \(i_{start}=1\)

Backpack i_start=3

Backpack with \(i_{start}=3\)

Backpack i_start=5

Backpack with \(i_{start}=5\)

Backpack i_start=7

Backpack with \(i_{start}=7\)

Backpack i_start=10

Backpack with \(i_{start}=10\)

Backpack i_start=20

Backpack with \(i_{start}=20\)

Backpack

Backpack

Character i_start=1

Character with \(i_{start}=1\)

Character i_start=3

Character with \(i_{start}=3\)

Character i_start=5

Character with \(i_{start}=5\)

Character i_start=7

Character with \(i_{start}=7\)

Character i_start=10

Character with \(i_{start}=10\)

Character i_start=20

Character with \(i_{start}=20\)

Character

Character

I also repeated the same procedure on images from the web and hand-drawn images.

Santa i_start=1

Santa with \(i_{start}=1\)

Santa i_start=3

Santa with \(i_{start}=3\)

Santa i_start=5

Santa with \(i_{start}=5\)

Santa i_start=7

Santa with \(i_{start}=7\)

Santa i_start=10

Santa with \(i_{start}=10\)

Santa i_start=20

Santa with \(i_{start}=20\)

Santa

Santa

Painting i_start=1

Painting with \(i_{start}=1\)

Painting i_start=3

Painting with \(i_{start}=3\)

Painting i_start=5

Painting with \(i_{start}=5\)

Painting i_start=7

Painting with \(i_{start}=7\)

Painting i_start=10

Painting with \(i_{start}=10\)

Painting i_start=20

Painting with \(i_{start}=20\)

Painting

Painting

Camera i_start=1

Camera with \(i_{start}=1\)

Camera i_start=3

Camera with \(i_{start}=3\)

Camera i_start=5

Camera with \(i_{start}=5\)

Camera i_start=7

Camera with \(i_{start}=7\)

Camera i_start=10

Camera with \(i_{start}=10\)

Camera i_start=20

Camera with \(i_{start}=20\)

Camera

Camera

1.7.2. Inpainting

By using a mask to specify regions to edit, we can inpaint images. Here are some examples:

Campanile

Campanile

Campanile Mask

Campanile Mask

Campanile Inpainted

Campanile Inpainted

Coffee

Coffee

Coffee Mask

Coffee Mask

Coffee Inpainted

Coffee Inpainted

Emoji

Emoji

Emoji Mask

Emoji Mask

Emoji Inpainted

Emoji Inpainted

1.7.3. Text-Conditional Image-to-Image Translation

By guiding the denoising process with text prompts, we can create edits that align with the desired description. In this part, I used the text embedding "a childish drawing" to guide the edits. Here are some examples:

Childish Campanile

Childish Campanile with \(i_{start}=1\)

Childish Campanile

Childish Campanile with \(i_{start}=3\)

Childish Campanile

Childish Campanile with \(i_{start}=5\)

Childish Campanile

Childish Campanile with \(i_{start}=7\)

Childish Campanile i_start=10

Childish Campanile with \(i_{start}=10\)

Childish Campanile i_start=20

Childish Campanile with \(i_{start}=20\)

Campanile

Campanile

Childish Character

Childish Character with \(i_{start}=1\)

Childish Character

Childish Character with \(i_{start}=3\)

Childish Character

Childish Character with \(i_{start}=5\)

Childish Character

Childish Character with \(i_{start}=7\)

Childish Character i_start=10

Childish Character with \(i_{start}=10\)

Childish Character i_start=20

Childish Character with \(i_{start}=20\)

Character

Character

Childish Emoji

Childish Emoji with \(i_{start}=1\)

Childish Emoji

Childish Emoji with \(i_{start}=3\)

Childish Emoji

Childish Emoji with \(i_{start}=5\)

Childish Emoji

Childish Emoji with \(i_{start}=7\)

Childish Emoji i_start=10

Childish Emoji with \(i_{start}=10\)

Childish Emoji i_start=20

Childish Emoji with \(i_{start}=20\)

Emoji

Emoji

1.8. Visual Anagrams

In this part, I created visual anagrams where the image appears as one thing when viewed normally, and another when flipped upside down. Here are some examples:

An oil painting of an old man

An oil painting of an old man

An oil painting of people around a campfire

An oil painting of people around a campfire

A bowl of noodles

A bowl of noodles

A stadium

A stadium

A bowl of noodles

An oil painting of a snowy mountain village

A stadium

A photo of a hipster barista

1.9. Hybrid Images

In this part, I created hybrid images that change appearance based on viewing distance. This uses a similar technique to visual anagrams, but blends high and low frequency components. Here are some examples:

Skull and Waterfall

Skull and waterfall

Stadium and Noodles

Stadium and noodles