Welcome to Portkey Forum

Updated last month

Using Vision Effectively in Chat Conversations

At a glance
I have a question about using vision properly in a chat context

In ChatGPT once you have an image in your chat conversation, you can continue to chat with it. It seems like this requires sending the image to the LLM every time (similar to sending the entire messages array in a chat convo). First of all, this will use a lot of tokens, but also, from a practical perspective, how would this be done?

Do you just keep passing all the Base64 data in a messages array like below?

messages: [
{
role: "user",
content: [
{ type: "text", text: "What’s in this image?" },
{
type: "image_url",
image_url:
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
},
],
V
1 comment
Yes, it's mostly this. Or you could try one of the newer vision embedding models to have an extended memory
Add a reply
Sign up and join the conversation on Discord