Hey folks, I had to setup load balancing for Perplexity (good problem to have). It seems to work, I get Loadbalancer active in the dashboard. But can I please trouble you guys to double check my config?
I have a hunch that I should be able to simplify it and wouldn't need to input virt_key_1 twice, but not sure how
I wanted to build an super-ultra fast mode so I made an API route directly to Cerebras, bypassing Portkey (I'm sorry for this betrayal lol).
But my assistant models can make a tool call to use Perplexity (via Portkey) for search. The UX is pretty cool.
But with Cerebras, these tool calls sometimes get cut-off. I thought it could be because of too many requests to Perplexity, but didn't seem to solve the issue.