* This blog post is a summary of this video.

Is DALL-E 3 Overhyped? Comparing It to Midjourney and Other AI Image Generators

Author: Tao PromptsTime: 2023-12-28 08:00:01

Table of Contents

DALL-E 3's Strengths: Understanding Language and Generating Text

As demonstrated in the YouTube video, DALL-E 3 has significant strengths when it comes to understanding natural language prompts and generating images that match the text descriptions. It outperforms other AI image generators like Midjourney in preserving the identities of individuals referenced in prompts. For example, asking DALL-E 3 to generate an image of Kevin Hart with Dwayne Johnson produces clear depictions of both celebrities, whereas Midjourney tends to fuse their appearances together.

Another key strength is DALL-E 3's ability to generate images with text, like a photo of Pikachu holding a sign. The AI is generally accurate at rendering the prompted text, although it struggles with longer phrases. This text generation capability also enables graphic design applications, like logo designs containing text.

Capturing Identity and Context

When provided with prompts referencing known individuals, places, or contexts, DALL-E 3 excels at preserving identities and depicting the intended situational settings. For example, requested images of Batman battling Spiderman contain identifiable renderings of both superheroes in action poses. Comparatively, Midjourney sometimes confuses contextual elements, fusing Batman with other characters like Venom in its attempted depictions.

Generating Images with Text

As demonstrated through examples like the Pikachu sign image, DALL-E 3 has an impressive capacity for incorporating written text into generated visuals. Its text rendering capabilities enable applications like designing graphics, logos, and images containing quotes or slogans. However, accuracy declines for longer phrases and non-English text cannot be handled at all currently.

DALL-E 3's Weaknesses: Lack of Features and Aesthetics

Despite clear strengths in understanding language prompts, DALL-E 3 lacks more advanced creative features offered by other AI art tools like Midjourney. The current DALL-E 3 interface only allows entering text prompts to generate square-shaped images. Comparatively, Midjourney boasts adjustable aspect ratios, style replication capabilities, and better overall artistic aesthetics.

No Aspect Ratio Adjustments

Unlike Midjourney which allows users to adjust aspect ratios, DALL-E 3 exclusively yields square-shaped generated images. This limits creative capabilities, as some prompt concepts are better depicted in landscape, portrait, or panoramic orientations. Attempting full body character prompts in DALL-E 3 often results in awkward croppings, an issue avoidable in Midjourney by tweaking aspect settings.

Struggles with Complex Styles

When prompted to replicate distinctive artistic styles from specific creators, DALL-E 3 falls short of Midjourney's capabilities. While DALL-E 3 can sometimes loosely capture basic style elements like color patterns or textures, Midjourney appears to pull from a more comprehensive training dataset enabling more accurate mimicking of artists like Studio Ghibli and Kazuki Takamatsu.

Midjourney Has Better Overall Aesthetics

The YouTube video highlighted several side-by-side comparisons indicating Midjourney's superiority when it comes to artistic aesthetics of generated images. Factors like color variation, lighting, balance, and visual interest tend to be more impressive in Midjourney creations. However, assessing artistic quality can be somewhat subjective between different viewers and styles.

The Overhyped ChatGPT Integration

One highly-touted future capability of DALL-E 3 was its planned integration with ChatGPT to enable text-guided image generation leveraging ChatGPT's language processing capacities. However, current testing suggests this integration is unable to deliver on many expected use cases due to the blurring out of any generated faces to address privacy concerns.

Inability to Generate Consistent Characters

One promising application of blending OpenAI's language and image AI systems was the possibility of creating consistent fictional characters that could be depicted in various settings and situations. Unfortunately, the blurring of faces by the ChatGPT integration appears to rule out reliable character generation for now. Other systems like Midjourney also struggle to maintain character consistency across images.

No Image History Access

A significant limitation researchers and creatives have noted is DALL-E 3's lack of any image search or browsing capabilities for accessing your full history of created images. While the interface shows some recently generated images, there is currently no way to search or scroll through your entire library of past creations. This makes organizing and building upon previous images extremely difficult compared to tools like Midjourney that offer image search.

Conclusion: DALL-E 3 Is Impressive But Overhyped

While DALL-E 3 represents an impressive advancement in AI image generation through its language understanding and text rendering skills, some of the hype seems premature given its limitations. Until more robust creative features, style replication abilities, adjustable formats, and image history access are added, Midjourney likely remains the most versatile and fully-featured AI art tool overall. But DALL-E 3's rapid evolution hints at exciting future potential if some current weaknesses can be addressed.


Q: How good is DALL-E 3 at understanding natural language?
A: DALL-E 3 has very strong natural language understanding that is superior to Midjourney. It accurately captures identity and context from prompts.

Q: What features does DALL-E 3 lack compared to Midjourney?
A: DALL-E 3 lacks basic features like aspect ratio adjustments and struggles with complex art styles. It also has no way to access your full image history.

Q: How was the ChatGPT integration overhyped?
A: It was hoped DALL-E 3 + ChatGPT could generate consistent fictional characters, but faces are blurred and it cannot analyze images in that way currently.