Welcome to Portkey Forum

Updated 2 weeks ago

The Challenges of Accommodating Image-Based Responses in AI Assistants

he has a fair point though, gemini, anthropic and all others support tool responses being images and so on, how do we accomodate for that at the gatewat level
V
s
r
19 comments
Yeah this is something we're thinking about on a daily basis
Maybe the time to move away from OpenAI as the standard is coming close
Reason is quite simple right, you could always say we support everything that oai does, and more
The problem is - when we support multiple ways of doing the same thing, we lose out on being able to do fallbacks & loadbalancing
That's the thing, you don't need to lose out on that
You can handle that conditionally, but it will have to ultimately raise an error if the format is incorrect for a different LLM, or, somethings you can handle differently conditionally to ensure that there is a clear difference in way of handling if the response has an image and so on
You are supporting OpenAI and more
With no code mods
If the user wants only oai, he will get only oai, if he wants more then, he can modify using your lib
I agree with this approach.. something we could clarify through docs as well.
consider this, with gemini enabling multimodal output in voice and text, and multimodal input - video, audio, image, and text, there will have to be some medium to access all this, that could me through added passable params to the chatcompletions.create.. also the gemini real time chat feature based on ws that allows text/video/audio to be streamed
OAI does not support alll of that, so additional optional passable params, with specific conditionals to ensure that these are only used for that specific model/family of models
and lib level errors if they are used wrongly, should suffice
I agree - care to chat to help us figure out the API contracts sometime/
lemme know when @rohit
Add a reply
Sign up and join the conversation on Discord