Magik Chance

Apr 44 min read

Image Editing through Chat: Testing Dall-E 3 Inpainting

OpenAi rolled out the ability to edit Dall-E 3 images yesterday. Today we take a look at its abilities and limitations, specifically through ChatGPT. This inpainting feature appears to also be available in Copilot but I don't personally see it in Bing yet.

Limitations

First, let's get a big limitation out of the way:

Inpainting is not currently available on images created through a custom GPT.

I have no idea why this is, and for me, it's a huge limitation since 95% of my images are generated using style GPTs that I created.

That said, it does appear to work on all images created through base ChatGPT, both new and old.

Test 1: Peacock Leopard

Here's a version of a peacock leopard I created a couple weeks ago:

I really love this image, but the tail is off. Let's try to correct that.

First we'll select just the tail and then ask for "Leopard's tail with peacock feathers."

Note, you can use the "regeneration" button on these! It's really helpful for getting a different seed, and if you're leaving things mostly to ChatGPT, possibly a slightly different prompt.

Here's the best of four regenerations:

None of the generations had peacock feathers, but at least we got a better looking tail.

But what if we selected more area to give the generator room to work? This time I selected most of the bottom right corner and picked the best out of four:

This was the only one that got peacock feathers. Not exactly as elaborate as I'd hope, but it is something.

Observations

Dall-E 3 is doing a good job of keeping the style of the image in all generations.
Generations are about 50% accurate to the prompt since they got one element (the leopard tail) every time but only got the other (the peacock feathers) 12.5% of the time.
There is a slight loss in detail in the regenerated areas. This makes them look blurry compared to the rest of the image.

Test 2: Tarot Card

Now let's test how well we can correct and add more particular things, mainly text. Here's a version of the Fool card I generated today:

You might notice some errors. First "The Foooll" is a little off from "The Fool" and the card number should be 0, not II.

Let's try conquering the number first since that's simpler. I selected the number and added the simple instruction of "The number 0."

It nailed it first try! It's a little big perhaps, but I am more than happy to take the win.

Now let's try to correct "The Foooll."

This was the first result and err...what exactly happened here?

Naturally I tried to edit the first corrected image (the one with the correct 0). Here is the actual prompt Dall-E 3 used in that image:

"An artistic rendition of the Fool tarot card, now including the number 0 prominently at the top of the card, symbolizing the Fool's position in the tarot deck as the beginning of the Major Arcana. The young and adventurous figure stands at the edge of a cliff, looking upwards with optimism. Dressed in medieval traveler's clothes, with a small satchel at their belt and a white rose in one hand, they symbolize innocence and new beginnings. A loyal dog is at their feet, looking up with affection. The background features a bright and sunny sky, with distant mountains representing the unknown journey ahead. The card, bordered with an ornate design, is in a 9:16 aspect ratio, perfect for mobile wallpapers or social media stories."

Now here is the new prompt:

"An artistic rendition of the Fool tarot card, now including the number 0 prominently at the top of the card and the words "The Fool" at the bottom, symbolizing the card's name and its position in the tarot deck as the beginning of the Major Arcana. The young and adventurous figure stands at the edge of a cliff, looking upwards with optimism. Dressed in medieval traveler's clothes, with a small satchel at their belt and a white rose in one hand, they symbolize innocence and new beginnings. A loyal dog is at their feet, looking up with affection. The sunny sky and distant mountains represent the unknown journey ahead. The card is bordered with an ornate design and is in a 9:16 aspect ratio, perfect for mobile wallpapers or social media stories."

As you can see, it slightly changes the prompt each time. Having two very specific labels likely confused the generator. I attempted to change just that part of the prompt to clarify but did not get better results.

This was the best change and ironically it was a regen from the first try (the one that got the double 0s.)

Observations

Simpler text changes are much easier for Dall-E 3.
Subsequent changes may have decreasing results.
Changes are made directly to the original prompt.

Conclusions

I'm very impressed by Dall-E 3's ability to keep the style consistent but a little disappointed by the lack of accuracy in these changes. While it might be hard to get exact details, this is still going to be a very useful feature for rerolling mistakes and artifacts.

What has your experiences with Dall-E 3 inpainting been like so far? Do you like the new feature or do you think it needs some work?

Creative AI Connections

Image Editing through Chat: Testing Dall-E 3 Inpainting

Limitations

Test 1: Peacock Leopard

Observations

Test 2: Tarot Card

Observations

Conclusions

Recent Posts

Comments