Configuring Load Balancing for Perplexity

At a glance

Hey folks, I had to setup load balancing for Perplexity (good problem to have). It seems to work, I get Loadbalancer active in the dashboard. But can I please trouble you guys to double check my config?

I have a hunch that I should be able to simplify it and wouldn't need to input virt_key_1 twice, but not sure how

{
"virtual_key": "virt_key_1",
"cache": {
"mode": "semantic",
"max_age": 10000
},
"retry": {
"attempts": 5,
"on_status_codes": [
429
]
},
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"virtual_key": "virt_key_1"
},
{
"virtual_key": "virt_key_2"
}
]
}

12 comments

VVrushank | Portkey

Haha, congrats! Very curious that you've taken Perplexity to production at this cale!

VVrushank | Portkey

Plain Text

{
    "cache": {
        "mode": "semantic",
        "max_age": 10000
    },
    "retry": {
        "attempts": 5,
        "on_status_codes": [
            429
        ]
    },
    "strategy": {
    "mode": "loadbalance"
    },
    "targets": [
        {
            "virtual_key": "virt_key_1"
        },
        {
            "virtual_key": "virt_key_2"
        }
    ]
}

VVrushank | Portkey

This should have the same effect as your original config. When you have a loadbalance strategy, any virtual key at root level is not considered

GGijs

THanks a lot! Actually it didn't solve my problem. I guess there's something else going on. But it's good to have this anyway.

VVrushank | Portkey

What's the problem you're facing? Requests are only going to virtual key 1?

GGijs

No, it's a bit more complicated than that.

I wanted to build an super-ultra fast mode so I made an API route directly to Cerebras, bypassing Portkey (I'm sorry for this betrayal lol).

But my assistant models can make a tool call to use Perplexity (via Portkey) for search. The UX is pretty cool.

But with Cerebras, these tool calls sometimes get cut-off. I thought it could be because of too many requests to Perplexity, but didn't seem to solve the issue.

VVrushank | Portkey

Ah interesting! Yeah may also be inherent nature of Cerebras' llama deployment

VVrushank | Portkey

Does the same thing happen on Groq or Sambanova?

VVrushank | Portkey

Those are the other two very fast inference providers i can think of

GGijs

Haven't tried..

GGijs

The annoying thing is it's intermittent, I added some validation to the chunks.

I can see the value of Portkey, and the debugging and inconsistencies you guys must be dealing with on a daily basis 😄

VVrushank | Portkey

haha, tell me about it!

Add a reply

Welcome to Portkey Forum

Configuring Load Balancing for Perplexity