Multimodal AI Rearranged My Living Room

ChatGPT gave me good advice for redesigning my living room. (Photo by Spacejoy at Unsplash.com)

Yet again I’m astonished by new AI offerings. ChatGPT Plus can now apply its seemingly endless intelligence to images.

When I first downloaded the latest version to my iPad, eager to see what it could do, I uploaded a photo of a bird taken by my trail camera. It correctly identified it as a song sparrow.

Then I used the camera feature within the ChatGPT app to take a photo of my aloe plant and asked why my plant looked unhealthy. It identified the plant as an aloe vera and gave me detailed instructions for proper care.

Then I took a photo of my living room and asked it, “Do you have any suggestions for rearranging my living room?” In its typically friendly manner, it replied, “Of course! Based on the image you provided, here are some suggestions to consider when rearranging your living room.”

It quickly proceeded to give me a list of great ideas, such as adjusting the location of the couch and chair, adding decorative items within my bookshelf, hanging the wall art at eye level, changing the location of the rug, adding a coffee table, and more.

It’s one thing to correctly identify a bird, and quite another to identify the objects in my living room and then help rearrange it.

This latest development is called multimodal AI. It combines text generation, image recognition and image generation, voice recognition, and speech synthesis. Or in the words of their promotion, “ChatGPT can now see, hear, and speak.”

If you have ChatGPT Plus on your Android device or iPhone or iPad, you can carry on a conversation. I can speak my queries, and it talks back to me.

Also, ChatGPT Plus can now generate images via the new DALL•E 3, and it’s impressive. You may remember when I wrote about image generators in the past that each failed to create the image I requested: an elephant on horseback. I have no idea why that image came to mind, but it certainly foiled them.

So of course when I tried the new image-generation capability in ChatGPT, I asked it for an elephant on horseback. Success!

I write about ChatGPT Plus because it’s clearly leading the way at this point with these amazing new features. But the disadvantage is that it costs $20 a month.

However, I was amazed to find that all these features are available for free in Microsoft’s Bing Chat, which incorporates ChatGPT-4. I gave it the same query about rearranging my living room. It took much longer and listed five suggestions, but only the first dealt specifically with what was in the photo and wasn’t really apt. The other suggestions were generic. Still, it recognized that there was a couch, chair, and coffee table in the photo.

I also asked it to generate a photorealistic image of an elephant riding on horseback, using the same prompt as with ChatGPT. It failed. Instead, it gave me really nice images of a person riding on an elephant with a horse also in the image.

Given that Bing Chat integrates ChatGPT-4 and DALL•E 3, it’s not clear why its performance doesn’t seem as good. But I’m certainly impressed with its range of features. As I write this, the desktop version of Bing Chat lets you speak your prompts and can talk back to you—a feature not yet available in the desktop version of ChatGPT Plus.

To use Microsoft’s Bing Chat, you need to create an account on Microsoft’s website and go to Bing.com using their free Edge web browser.

Google’s free Bard (Bard.Google.com) is close behind ChatGPT and Bing Chat, and I keep reading they have a forthcoming AI model that will be even more powerful than ChatGPT. As I write this, Bard’s multimodal features are limited to responding to photos you upload. So of course I had to ask it to help rearrange my living room, giving it the same photo and prompt as the previous instances. It was a bit more specific than Bing Chat, but not nearly as good as ChatGPT-4. Bard also lets you speak your prompts.

All three, then, can interact with images. (Note, though, that to safeguard privacy, these image recognition features won’t identify faces.)

I initially enjoyed ChatGPT’s suggestions for rearranging my living room, but, frankly, I didn’t take them seriously. Then, darn it, over the coming days as I’d walk through my living room, I was increasingly aware that my plush armchair did indeed feel out of place.

Finally, I couldn’t stand it and did just as ChatGPT suggested: I moved the couch and chair out farther from the adjoining walls and moved the chair closer to the couch. And placed the rug as ChatGPT suggested. It really did have an effect of changing the character of my living room, making it feel more balanced.

Thank you, ChatGPT. 

Find column archives at JimKarpen.com.