Hi, Is there any fallback strategy where I can fallback if I reached max length of first LLM? I want to basically call Llama by default and switch to Claude if we hit max tokens on llama call.
We don't specifically have fallbacks on these checks, but since the request would anyway automatically fail on exceeding the max token length, setting up a simple fallback that is triggered on ANY Llama error should work.
If there is a consistent error code that's generated for reaching max length of Llama, you can define the fallback only on that error code with the on_status_codes array as well
Got it! Currently, we don't do routing based on that, but we are soon launching a new feature (before the end of this month) where you'd be able to do something to this effect. (It's very exciting!)