he has a fair point though, gemini, anthropic and all others support tool responses being images and so on, how do we accomodate for that at the gatewat level
You can handle that conditionally, but it will have to ultimately raise an error if the format is incorrect for a different LLM, or, somethings you can handle differently conditionally to ensure that there is a clear difference in way of handling if the response has an image and so on
consider this, with gemini enabling multimodal output in voice and text, and multimodal input - video, audio, image, and text, there will have to be some medium to access all this, that could me through added passable params to the chatcompletions.create.. also the gemini real time chat feature based on ws that allows text/video/audio to be streamed
OAI does not support alll of that, so additional optional passable params, with specific conditionals to ensure that these are only used for that specific model/family of models