Enhancing AI Artwork with Stable Diffusion Power-Ups
Table of Contents
- Using DALL-E 3 and Stable Diffusion 1.5 for AI Art
- Workflow for Upscaling DALL-E 3 Images
- Exploring Upscaling with Stable Diffusion XLS
- Instant DALL-E: SDXL Upscaling Workflow
- Final SDXL Instant DALL-E Version
Using DALL-E 3 and Stable Diffusion 1.5 for AI Art
Making images with DALL-E 3 via Bing Image Creator is indeed fun, as you can see it's great at interpreting your prompts. But you're stuck with a square image of perhaps somewhat questionable quality.
I've already released the Stable Diffusion 1.5 power up workflows. Here you can see one in action - we've got the original DALL-E 3 image there and the powered up one from Stable Diffusion. As you can see, there's quite a few more details in there, even though it's gone a little bit uncanny valley.
DALL-E 3 Image Quality and Limitations
The key limitations of DALL-E 3 images currently are:
- Low resolution (512x512 pixels)
- Squarish 4:3 aspect ratio
- Lack of fine details and textures So while the images can be very creative, they don't rival professional photography or art.
Leveraging Stable Diffusion 1.5 for Enhancements
Stable Diffusion 1.5 offers significantly more control over image generation, allowing us to take a DALL-E image and enhance it in various ways. In particular, we can upscale to much higher resolutions with more realistic details. We can also adjust the aspect ratio, lighting, colors, etc. And we have extensive prompt shaping capabilities.
Workflow for Upscaling DALL-E 3 Images
Now let's walk through a workflow for leveraging Stable Diffusion to upscale and enhance AI art from DALL-E.
We'll be using one of the Instant DALL-E power up workflows available on GitHub. These make the process very streamlined.
Power Up Process Overview
With the Instant DALL-E workflow, you simply drop in your 512x512 DALL-E image and let it rip. Behind the scenes, it handles multi-step upscaling with intelligent prompts and transformations. You have the ability to customize the starting prompts and aspect ratio. Then sit back and in around 20 minutes, you'll have a high resolution, print-worthy 4K image ready to go.
Example Images and Results
Here's an example showcasing the power up process. We started with a basic DALL-E image of two rodents and some machinery. Nice and creative but lacking in resolution and realism. After running through Instant DALL-E, we end up with a gorgeous 4K image with striking details in the fur, machinery, environment, and more. The lighting and atmosphere also improved. So as you can see, combining these AI models gives us the best of both worlds - the creative concept from DALL-E plus realistic detailing from Stable Diffusion upgrades.
Exploring Upscaling with Stable Diffusion XLS
Now let's explore going a step further by leveraging Stable Diffusion XLS for the upscaling workflow.
XLS has even greater capabilities than SD 1.5 but there are some key compatibility considerations to work through when mixing different models.
Compatibility Considerations
Initially, a challenge was that XLS does not yet have a tile control net. This is important for efficiently upscaling images to higher resolutions. Another issue was that some XLS models still rely on components from SD 1.5 under the hood. So mixing and matching required tweaking to avoid conflicts.
Revisions Approach for Style and Subjects
Examining Stable Diffusion's 'Revisions' workflow provided inspiration. Revisions leverages an input image to guide style and subjects, while allowing variation in the final output. By taking a similar approach focused on XLS and upscaling, I developed the 'Instant DALL-E XL' workflow. This opened up all the power of XLS while resolving the compatibility issues.
Instant DALL-E: SDXL Upscaling Workflow
Over multiple iterations of testing and refinement, the Instant DALL-E workflow stabilized into an efficient XLS-based solution.
Iterative Upscaling Tests
The key was incorporating XLS iterative upscalers, which cleanly handle growing images to higher and higher resolutions. No tile control net needed! Numerous tests were run with different models, hyperparameters, and techniques. Analyzing the results at each stage shaped improvements for the next version.
Enhancing Control with Additional Nets
Supplementary control nets like Depth and Canny were added for greater precision over elements like lighting and edges. Swapping different model components until finding the optimal combination was crucial. For example, using the ClipVision model from XLS Revisions rather than SD 1.5 improved compatibility and accuracy.
Final SDXL Instant DALL-E Version
The end result combines the strengths of DALL-E creativity and XLS realism into an automated workflow requiring minimal prompts.
Flexible Controls for Customization
The interface provides advanced controls to customize your outputs. Dial in the influence from the input images, randomness levels, aspect ratio, and more. Flexible prompt shaping makes refinement easy.
Applicable to Various Image Sources
While focused on upscaling DALL-E outputs, this workflow can enhance any image source. So feel free to power up photos, concept art, your own AI generations, and more with Instant DALL-E!
FAQ
Q: Can this process work for my own DALL-E 3 images?
A: Yes, you can use the freely available workflows to upscale and enhance any DALL-E 3 images you have created.
Q: What if I don't have DALL-E 3 access?
A: The workflows can be applied to other image sources as well, including photographs. The process allows customizing the final enhanced output.