The Challenges of Accommodating Image-Based Responses in AI Assistants

Question

he has a fair point though, gemini, anthropic and all others support tool responses being images and so on, how do we accomodate for that at the gatewat level

Vrushank | Portkey · Answer

Yeah this is something we're thinking about on a daily basis

Vrushank | Portkey · Answer

Maybe the time to move away from OpenAI as the standard is coming close

s44002 · Answer

Reason is quite simple right, you could always say we support everything that oai does, and more

Vrushank | Portkey · Answer

Yeah

Vrushank | Portkey · Answer

The problem is - when we support multiple ways of doing the same thing, we lose out on being able to do fallbacks & loadbalancing

s44002 · Answer

That's the thing, you don't need to lose out on that

s44002 · Answer

You can handle that conditionally, but it will have to ultimately raise an error if the format is incorrect for a different LLM, or, somethings you can handle differently conditionally to ensure that there is a clear difference in way of handling if the response has an image and so on

s44002 · Answer

You are supporting OpenAI and more

s44002 · Answer

With no code mods

s44002 · Answer

If the user wants only oai, he will get only oai, if he wants more then, he can modify using your lib

Vrushank | Portkey · Answer

Hmm fair

rohit · Answer

I agree with this approach.. something we could clarify through docs as well.

s44002 · Answer

consider this, with gemini enabling multimodal output in voice and text, and multimodal input - video, audio, image, and text, there will have to be some medium to access all this, that could me through added passable params to the chatcompletions.create.. also the gemini real time chat feature based on ws that allows text/video/audio to be streamed

s44002 · Answer

OAI does not support alll of that, so additional optional passable params, with specific conditionals to ensure that these are only used for that specific model/family of models

s44002 · Answer

and lib level errors if they are used wrongly, should suffice

rohit · Answer

I agree - care to chat to help us figure out the API contracts sometime/

s44002 · Answer

Of course

s44002 · Answer

Definetly

s44002 · Answer

lemme know when @rohit

Welcome to Portkey Forum

The Challenges of Accommodating Image-Based Responses in AI Assistants