Using Vision Effectively in Chat Conversations

I have a question about using vision properly in a chat context

In ChatGPT once you have an image in your chat conversation, you can continue to chat with it. It seems like this requires sending the image to the LLM every time (similar to sending the entire messages array in a chat convo). First of all, this will use a lot of tokens, but also, from a practical perspective, how would this be done?

Do you just keep passing all the Base64 data in a messages array like below?

messages: [
{
role: "user",
content: [
{ type: "text", text: "What’s in this image?" },
{
type: "image_url",
image_url:
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
},
],

Welcome to Portkey Forum

Using Vision Effectively in Chat Conversations