Back to Blog

AI unhyped: Visual input & output

AI unhyped: Visual input & output

This post is from our e-mail newsletter. Sign up here to receiver regular updates on AI from Applai.

AI unhyped

Hi  👋, it's Mathias from Applai. Here's what you need to know for 16 October, 2023 in 3 minutes and 52 seconds.

In this edition, we'll cover:

  • New visual capabilities of ChatGPT
  • How to get ChatGPT to generate images for you
  • News from Applai

So, when you have read this newsletter, you will know both how to use images as input and as output in ChatGPT, and on top of that you'll know what we are up to for the moment in Applai ...But first:

Why are we now making a newsletter?

If you follow AI news on LinkedIn or X (formerly known as Twitter), you are probably being bombarded with stories about how some arbitrary new small tool will “revolutionize” the world as we know it. That is seldomly actually the case.

See for example posts like this one… “It’s changed everything”... Really? 🤔

So why do we need another AI news source, you might ask. This sensationalism on social media is exactly why we think we have something to contribute with in this newsletter, "AI unhyped”. In this, we peel away the noise, and write about the news and new features that we think are cool. And if there is a week where nothing major is happening, we will just not send a newsletter.

Both Victor and I are thrilled to journey with you. In this week, we are focusing solely on new visual capabilities of ChatGPT from OpenAI. New features are being rolled out to users enabling ChatGPT to both understand visual inputs and generate visual outputs. And we think that is pretty cool. Ready to dive into this week's insights? Here we go! 👇

This week's Stories:

Visual input in ChatGPT: GPT-4V

Images speak louder than words - maybe also for ChatGPT…

What’s going on here?

GPT-4V is the latest kid on the block and guess what? The 'V' stands for vision. In ChatGPT, users can now use the latest version of GPT-4V, which has the ability to understand and process visual inputs. This means that you can attach an image with your prompt, and let ChatGPT understand both the text and the image 📸 You can read more about how it’s working and how it is safeguarded here: OpenAI's GPT-4V Deep Dive

What does this mean?

It's no longer about just textual analysis. Whether it's an image of a handwritten note or a poor drawing, ChatGPT can decipher and generate responses based on it. How cool is that? 🤯

Not that long ago, we for example needed to come up with a name for this newsletter. As old school as it sounds, we brainstormed on some names (OK, admittedly we got some help on that part from ChatGPT), wrote the names we liked most on Post-It notes, and then asked some good friends of Applai to vote on the name they favored. We then simply snapped a photo of the notes, uploaded it to ChatGPT and asked it to document our little session. See how it did here:

That cool, right? And of course it wouldn’t have been such a big task to document the votes between our three name suggestions, but imagine the capabilities of this with much larger sessions or workshops. You can see other good examples of use-cases of ChatGPT with GPT-4V from the X user, Nomad, here.

Why should I care?

For ChatGPT Plus users, this is a super cool new feature. But even if you're not a Plus user, here's a sneaky tip: you can give it a spin for free on Bing Chat.

At the moment, GPT-4V is not available through an open API such as it is the case for e.g. the text version of GPT-4. But OpenAI has announced that an GPT-4V API will be available later in the fall. We can in Applai of course not wait to get our hands on GPT-4V through the API so we can start building it into Applai Chat 🛠️ But for now we are also quite exited to just play around with it on ChatGPT…

Keep on reading:

OpenAI's GPT-4V Deep Dive

Thread on X with use-cases by Nomad.

OK, that was visual inputs in ChatGPT. But ChatGPT can now also generate image outputs:

Visual output in ChatGPT: Dall-E 3

Artificial Intelligence meets Picasso.

What’s going on here?

DALL-E 3 is the newest version of the image model from OpenAI, and it brings with it the capability to generate visual outputs. It provides a unique avenue for users to get creative with art generation, images, and graphics. And this is also being rolled out to users in ChatGPT right now.

What does this mean?

Say you need a visual representation for a project. You might sketch an initial idea and with DALL-E 3, refine that concept into a more polished graphic. This could be used for logos, illustrations, or any visual content for that matter. Now you can do all of that inside ChatGPT and get image outputs out of ChatGPT.

Recently, we tried converting some of our brainstormed sketches into detailed images using DALL-E 3. Check out the results here:

Why should I care?

For those who have access to ChatGPT Plus, DALL-E 3 is an added feature you can explore. If you're not a Plus user, a tip: you can experiment with a similar functionality on Microsoft Bing Image Creator without any cost.

Keep on reading:

OpenAI's Dall-E 3 Explained

Can't wait to start generating images in ChatGPT? Maybe you should, just to read the last story of this week, where we connect the dots... Visual input, image output 🤯

GPT-4V & DALL-E 3 in ChatGPT

ChatGPT: From Vision to Visuals

What’s going on here?

By integrating the capabilities of GPT-4V and DALL-E 3, ChatGPT can now not only understand visual inputs but can also generate and refine visuals - at the same time!

What does this mean?

Have a rough sketch for a project? You can potentially refine and transform that concept into a detailed graphic. The combination of GPT-4V and DALL-E 3 in ChatGPT certainly expands the boundaries of what we can achieve visually.

For instance, we recently used this combo to turn a simple doodle made during a team brainstorming session into a detailed illustration. The result? A visually appealing representation of our initial idea, and something that is at least quite a lot closer to for example being shared with customers or stakeholders.

Why should I care?

GPT-4V and DALL-E 3 are really cool features in ChatGPT by themselves, but together they really enable a lot of new use-cases for ChatGPT. We still don’t think any of this will replace good graphical designers, but now early phases and ideation of visual projects is very accessible for anyone inside ChatGPT - and then you can call a designer for the challenging tasks.

That was all of external news for the past week. But since we have your attention, we also want to share some of the things we are up to in Applai.

Latest news from Applai…

What has happened in Applai recently? We have been busy with all sorts of things, but among the highlights are of course the launch of this very newsletter, you are reading. We are really curious to hear what you think about it, and if you have any suggestions for things we should do differently. Reach out to mathias@applai.io if you have any comments - we are happy to hear from you.

Besides our newsletter, Victor has been busy with presentations and courses in the past week. Among other things, he had a course day in Køge this week, with a group of super engaged professionals from local businesses. Check out the LinkedIn post about that experience here.

Sharing and Feedback

That was all from this first edition of AI unhyped. How did we do, ? Reach out to mathias@applai.io if you have any feedback. And if there's any news you believe deserved a spot in this newsletter but didn't get one, we'd love to hear about it!

And we are of course also very happy if you help us spread the word. So please share AIunhyped with your colleagues and friends. Share this sign up link with them 💌

Until our next edition

Mathias Villads ✌️

Back to Blog
Cookie Settings
This website uses cookies

Cookie Settings

We use cookies to improve user experience. Choose what cookie categories you allow us to use. You can read more about our Cookie Policy by clicking on Cookie Policy below.

These cookies enable strictly necessary cookies for security, language support and verification of identity. These cookies can’t be disabled.

These cookies collect data to remember choices users make to improve and give a better user experience. Disabling can cause some parts of the site to not work properly.

These cookies help us to understand how visitors interact with our website, help us measure and analyze traffic to improve our service.

These cookies help us to better deliver marketing content and customized ads.