Welcome to Portkey Forum

Updated 3 days ago

Configuring Load Balancing for Perplexity

Hey folks, I had to setup load balancing for Perplexity (good problem to have). It seems to work, I get Loadbalancer active in the dashboard. But can I please trouble you guys to double check my config?

I have a hunch that I should be able to simplify it and wouldn't need to input virt_key_1 twice, but not sure how

{
"virtual_key": "virt_key_1",
"cache": {
"mode": "semantic",
"max_age": 10000
},
"retry": {
"attempts": 5,
"on_status_codes": [
429
]
},
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"virtual_key": "virt_key_1"
},
{
"virtual_key": "virt_key_2"
}
]
}
V
G
12 comments
Haha, congrats! Very curious that you've taken Perplexity to production at this cale!
Plain Text
{
    "cache": {
        "mode": "semantic",
        "max_age": 10000
    },
    "retry": {
        "attempts": 5,
        "on_status_codes": [
            429
        ]
    },
    "strategy": {
    "mode": "loadbalance"
    },
    "targets": [
        {
            "virtual_key": "virt_key_1"
        },
        {
            "virtual_key": "virt_key_2"
        }
    ]
}
This should have the same effect as your original config. When you have a loadbalance strategy, any virtual key at root level is not considered
THanks a lot! Actually it didn't solve my problem. I guess there's something else going on. But it's good to have this anyway.
What's the problem you're facing? Requests are only going to virtual key 1?
No, it's a bit more complicated than that.

I wanted to build an super-ultra fast mode so I made an API route directly to Cerebras, bypassing Portkey (I'm sorry for this betrayal lol).

But my assistant models can make a tool call to use Perplexity (via Portkey) for search. The UX is pretty cool.

But with Cerebras, these tool calls sometimes get cut-off. I thought it could be because of too many requests to Perplexity, but didn't seem to solve the issue.
Ah interesting! Yeah may also be inherent nature of Cerebras' llama deployment
Does the same thing happen on Groq or Sambanova?
Those are the other two very fast inference providers i can think of
Haven't tried..
The annoying thing is it's intermittent, I added some validation to the chunks.

I can see the value of Portkey, and the debugging and inconsistencies you guys must be dealing with on a daily basis πŸ˜„
haha, tell me about it!
Add a reply
Sign up and join the conversation on Discord