* This blog post is a summary of this video.

Analyze Images and Generate AI Artwork with Microsoft Bing Chat

Author: John MooreTime: 2024-02-06 10:00:01

Table of Contents

Introduction to Bing Chat Enterprise Image Generation and Analysis Capabilities

Bing Chat Enterprise is Microsoft's enterprise-focused conversational AI assistant, built on top of the same core AI capabilities as the consumer Bing chatbot. It comes bundled with Microsoft 365 licenses, providing employees with an intelligent assistant to help boost productivity, analyze data, and more. Recently, Microsoft added powerful new generative AI features to Bing Chat Enterprise focused on image creation and analysis. These mirror capabilities also available in the consumer Bing chatbot.

In this blog post, we'll explore these new Bing Chat Enterprise features for generating custom AI images on demand and analyzing images uploaded by the user. We'll look at example use cases and the benefits these capabilities can provide in the workplace.

Bing Chat Enterprise Capabilities for Protecting Sensitive Data

Unlike the consumer version of Bing Chat, Bing Chat Enterprise provides additional privacy protections and compliance to handle sensitive business data. User's queries and conversations with the bot are kept private and not used to improve Microsoft's AI models. Data is not mined or utilized for advertising targeting. This gives users confidence that their company's confidential information remains protected when interacting with Bing Chat Enterprise. Employees can freely use the assistant for sensitive tasks without worrying about data leaks or compliance violations.

Leveraging Generative AI in Bing Chat Enterprise

The core of Bing Chat Enterprise is powered by advanced generative AI models developed by Microsoft. This includes foundational large language models (LLMs) trained on massive text datasets, allowing the bot to understand natural language queries and respond intelligently. Building on these models, Microsoft added capabilities specifically focused on image generation and image understanding. This allows Bing Chat Enterprise to not only describe images uploaded by the user, but also create custom images from text prompts provided by the user.

Generating Custom AI Images with Bing Chat Enterprise

One of the most exciting new capabilities in Bing Chat Enterprise is the ability to generate custom AI images through conversational prompts. This allows users to essentially "imagine" any scene or creative work of art, describe it in words to the bot, and have AI generate a quality image visualizing their description.

To generate images, users simply need to prompt the bot with text descriptions of the desired image. For example, saying "Make a picture of a flower". Bing Chat will then leverage generative adversarial networks and diffusion models to output four different high-resolution image options based on the user's text prompt.

Users can review the images and choose any of the options to expand and inspect further. Or, they can take an iterative approach and continue providing additional prompts to augment the image, adding new elements and modifying based on the context of the previously generated pictures.

Customizing and Augmenting AI Generated Images

One of the key benefits of Bing Chat's image generation capabilities is the ability to iteratively customize the output images through an ongoing conversation. After the bot generates initial image options from a base prompt, users can provide follow-up prompts modifying colors, backgrounds, adding or removing elements, and more. Bing leverages the context from previous iterations to intelligently augment the images based on the new guidance.

This makes the experience feel more collaborative and creative, with the human and the AI assistant working together to bring imagined scenes to life. Below we'll explore some examples of modifying and augmenting generated images through iterative conversational exchanges.

Modifying Colors and Backgrounds of Generated Images

Once initial images are generated from a text prompt, colors and backgrounds can be tweaked through additional prompts. For example, after asking to generate a picture of a flower, a user could follow up saying "Make the petals blue" or "Put the flower against a sunset background". Bing Chat's AI will then modify the previously generated images to match the new guidance, leveraging the context of the images that came before to make intelligent and relevant changes.

Adding and Removing Elements from Generated Images

In addition to colors and backgrounds, users can prompt the addition or removal of entire objects and elements. Continuing the flower example, after viewing initial flower images, follow-up prompts could include:

  • "Add some butterflies to the picture"
  • "Put the flower and butterflies against a futuristic cyberpunk backdrop"
  • "Remove the petals and just show the stem" This allows users to tap into the AI's creative capabilities to compose custom scenes tailored to their interests and use cases. The generative AI handles the heavy lifting of rendering quality images, while the human director provides the creative guidance.

Analyzing Images Uploaded to Bing Chat Enterprise

In addition to image generation capabilities, Bing Chat Enterprise also enables analysis and understanding of images provided by the user. This computer vision functionality allows employees to get AI-assisted insights from pictures related to their work.

Users can upload or take photos from within the chat interface and ask questions based on the contents of the image. For example, after uploading a photo of a crowded conference room, you could ask Bing Chat:

  • "How many people are in this picture?"
  • "What objects are on the table?"
  • "Is anything missing from the room?"

The AI will scan and process the image to provide detailed responses to these queries, augmenting the conversational exchange with dynamic computer vision intelligence.

Counting Objects and People in Uploaded Images

One straightforward yet powerful application of Bing Chat's image analysis is counting discrete objects and people within a photo. This can provide useful numeric insights for tasks like inventory management, occupancy monitoring, and more. In the example queried earlier, after uploading an image of a conference room, Bing Chat would accurately count the number of visible people and report back that figure. The same process could be applied to count items on warehouse shelves, cars in a parking lot, and any other objects of interest.

Identifying Objects, Text, and Context in Images

In addition to counting, Bing Chat can identify and describe various elements of an uploaded image through natural language responses. Using computer vision techniques like object detection and optical character recognition, the AI can detect both visual objects and text found in images. In the conference room example, the bot may report back tables, chairs, whiteboards, and screens it detected. If there was text visible on the whiteboard or in signs around the room, the bot could also transcript those words and strings of text back to the user. This context helps the bot better understand the purpose and contents of the image to have a more intelligent discussion and provide recommendations if needed. For example, if certain expected objects were missing from the conference room, the bot could flag that and suggest adding them.

Use Cases for Image Analysis with Bing Chat Enterprise

Now that we've explored the technical capabilities of Bing Chat Enterprise for image generation and image analysis, let's discuss some potential use cases where these features could provide real value in the workplace:

Productivity and Efficiency Gains

Bing Chat's unique combination computer vision and conversational AI can help workers be more efficient and productive in various visual-related tasks. For example:

  • Analyzing images from warehouses or store shelves to take quick inventory counts
  • Reviewing images of product defects flagged by quality control processes and determining root causes
  • Having the bot scan conference room or office images to ensure proper set up and equipment In these types of productivity use cases, the bot augments human capabilities and attention by leveraging AI to process visual inputs at scale.

Assisting Visually Impaired Employees

The conversational nature of Bing Chat also makes it uniquely suited to provide visual assistance for blind or visually impaired employees. Those employees could snap photos with their phone and upload to Bing Chat to understand the contents through the bot's descriptions. This enables the AI assistant to serve as a secondary, intelligent "eye" for the visually impaired. It opens up new levels of accessibility and independence in navigating spaces with visual components, like printed signs, forms to fill out, bus schedules to read, etc.


Bing Chat Enterprise's powerful new image generation and image analysis capabilities provide businesses with AI-driven ways to boost creativity, productivity, and accessibility. While similar features are available in consumer chatbot platforms, Bing Chat Enterprise's security, compliance, and enterprise integration make it the ideal solution for business use cases.

As Microsoft continues enhancing Bing Chat Enterprise with new generative AI functions, it will unlock even more ways for employees to save time, gain insights, and enhance workflows with the help of an AI assistant.


Q: How do I access Bing Chat Enterprise?
A: Bing Chat Enterprise is included with E3 and E5 Microsoft 365 licenses. Just log in with your work credentials.

Q: What image formats can I analyze?
A: Bing Chat supports common image formats like JPG, PNG and GIF for image analysis.

Q: Can I download the AI generated images?
A: Yes, Bing Chat allows you to download 1024x1024 JPG images created by the AI.

Q: Is there a limit to how many images I can analyze?
A: Bing Chat currently allows analysis of 1 uploaded image per conversation turn, up to 30 turns.

Q: Does Bing read text in images out loud?
A: No, Bing Chat does not currently have text-to-speech capabilities for reading text detected in images.

Q: Can I get higher resolution AI generated images?
A: Currently, the maximum image resolution is 1024x1024 pixels. Higher resolutions may be supported in the future.

Q: What objects can Bing identify in images?
A: Bing Chat has a extensive vision AI that can identify common objects, text, and scenery. It may have difficulty with highly complex or obscure images.

Q: Can I correct Bing if it misidentifies something?
A: Yes, you can clarify objects or text that Bing misunderstands and it will update its understanding.

Q: Is there an API to integrate Bing image features into apps?
A: Not currently, but Microsoft may release APIs in the future to integrate Bing Chat capabilities into third-party apps.

Q: What permissions do I need to analyze images?
A: No special permissions are needed. All Bing Chat Enterprise users can utilize image analysis features.