I have a question about using vision properly in a chat contextIn ChatGPT once you have an image in your chat conversation, you can continue to chat with it. It seems like this requires sending the image to the LLM every time (similar to sending the entire messages array in a chat convo). First of all, this will use a lot of tokens, but also, from a practical perspective, how would this be done?
Do you just keep passing all the Base64 data in a messages array like below?
messages: [
{
role: "user",
content: [
{ type: "text", text: "What’s in this image?" },
{
type: "image_url",
image_url:
"
https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
},
],