* This blog post is a summary of this video.

Enhancing AI Artwork with Stable Diffusion Power-Ups

Author: Nerdy RodentTime: 2023-12-31 02:35:02

Table of Contents

Using DALL-E 3 and Stable Diffusion 1.5 for AI Art

Making images with DALL-E 3 via Bing Image Creator is indeed fun, as you can see it's great at interpreting your prompts. But you're stuck with a square image of perhaps somewhat questionable quality.

I've already released the Stable Diffusion 1.5 power up workflows. Here you can see one in action - we've got the original DALL-E 3 image there and the powered up one from Stable Diffusion. As you can see, there's quite a few more details in there, even though it's gone a little bit uncanny valley.

DALL-E 3 Image Quality and Limitations

The key limitations of DALL-E 3 images currently are:

  • Low resolution (512x512 pixels)
  • Squarish 4:3 aspect ratio
  • Lack of fine details and textures So while the images can be very creative, they don't rival professional photography or art.

Leveraging Stable Diffusion 1.5 for Enhancements

Stable Diffusion 1.5 offers significantly more control over image generation, allowing us to take a DALL-E image and enhance it in various ways. In particular, we can upscale to much higher resolutions with more realistic details. We can also adjust the aspect ratio, lighting, colors, etc. And we have extensive prompt shaping capabilities.

Workflow for Upscaling DALL-E 3 Images

Now let's walk through a workflow for leveraging Stable Diffusion to upscale and enhance AI art from DALL-E.

We'll be using one of the Instant DALL-E power up workflows available on GitHub. These make the process very streamlined.

Power Up Process Overview

With the Instant DALL-E workflow, you simply drop in your 512x512 DALL-E image and let it rip. Behind the scenes, it handles multi-step upscaling with intelligent prompts and transformations. You have the ability to customize the starting prompts and aspect ratio. Then sit back and in around 20 minutes, you'll have a high resolution, print-worthy 4K image ready to go.

Example Images and Results

Here's an example showcasing the power up process. We started with a basic DALL-E image of two rodents and some machinery. Nice and creative but lacking in resolution and realism. After running through Instant DALL-E, we end up with a gorgeous 4K image with striking details in the fur, machinery, environment, and more. The lighting and atmosphere also improved. So as you can see, combining these AI models gives us the best of both worlds - the creative concept from DALL-E plus realistic detailing from Stable Diffusion upgrades.

Exploring Upscaling with Stable Diffusion XLS

Now let's explore going a step further by leveraging Stable Diffusion XLS for the upscaling workflow.

XLS has even greater capabilities than SD 1.5 but there are some key compatibility considerations to work through when mixing different models.

Compatibility Considerations

Initially, a challenge was that XLS does not yet have a tile control net. This is important for efficiently upscaling images to higher resolutions. Another issue was that some XLS models still rely on components from SD 1.5 under the hood. So mixing and matching required tweaking to avoid conflicts.

Revisions Approach for Style and Subjects

Examining Stable Diffusion's 'Revisions' workflow provided inspiration. Revisions leverages an input image to guide style and subjects, while allowing variation in the final output. By taking a similar approach focused on XLS and upscaling, I developed the 'Instant DALL-E XL' workflow. This opened up all the power of XLS while resolving the compatibility issues.

Instant DALL-E: SDXL Upscaling Workflow

Over multiple iterations of testing and refinement, the Instant DALL-E workflow stabilized into an efficient XLS-based solution.

Iterative Upscaling Tests

The key was incorporating XLS iterative upscalers, which cleanly handle growing images to higher and higher resolutions. No tile control net needed! Numerous tests were run with different models, hyperparameters, and techniques. Analyzing the results at each stage shaped improvements for the next version.

Enhancing Control with Additional Nets

Supplementary control nets like Depth and Canny were added for greater precision over elements like lighting and edges. Swapping different model components until finding the optimal combination was crucial. For example, using the ClipVision model from XLS Revisions rather than SD 1.5 improved compatibility and accuracy.

Final SDXL Instant DALL-E Version

The end result combines the strengths of DALL-E creativity and XLS realism into an automated workflow requiring minimal prompts.

Flexible Controls for Customization

The interface provides advanced controls to customize your outputs. Dial in the influence from the input images, randomness levels, aspect ratio, and more. Flexible prompt shaping makes refinement easy.

Applicable to Various Image Sources

While focused on upscaling DALL-E outputs, this workflow can enhance any image source. So feel free to power up photos, concept art, your own AI generations, and more with Instant DALL-E!

FAQ

Q: Can this process work for my own DALL-E 3 images?
A: Yes, you can use the freely available workflows to upscale and enhance any DALL-E 3 images you have created.

Q: What if I don't have DALL-E 3 access?
A: The workflows can be applied to other image sources as well, including photographs. The process allows customizing the final enhanced output.