Welcome to Portkey Forum

Updated 2 weeks ago

Serving Self-hosted Llama 3.1 405B on Multiple Vms with H100s

Hey guys, let's say i want to serve self hosted LLAMA 3.1 405B on 2 different VMs with H100s, I use vLLM + ngrok alternative btw.

I have tested Loadbalancing mode, works fine. with this config:

{ "strategy": { "mode": "loadbalance" }, "targets": [ { "provider": "openai", "custom_host": "https://llama-1.tunnels-dev.io.systems/v1", "api_key": "Bearer dummy-api-key-vm1k", "weight": 0.4, "override_params": { "model": "meta-llama/Llama-3.1-405B-FP8" } }, { "provider": "openai", "custom_host": "https://llama-2.tunnels-dev.io.systems/v1", "api_key": "Bearer dummy-api-key-vm2", "weight": 0.6, "override_params": { "model": "meta-llama/Llama-3.1-405B-FP8" } } ], "cache": { "mode": "simple", "max_age": 60000 }, "retry": { "attempts": 3, "on_status_codes": [ 404, 429, 500, 520 ] } }

But i want to achieve one more goal here, in case all retries to llama-1 (target-0) fails, i need to fallback to target-1.

basically i want a mode called loadbalance-fallback from the Docs, looks like there is such hack like this, but are there any better ways ?
S
i
7 comments
the example in the docs is really the most clear way to configure it, you can simply nest targets
But I get that its verbose to write the same target twice
but let's say i have 10 urls, that i want to load balance, and fallback to others incase it fails.

in this case i need to create 10*10 targets in the configs, looks like not the most clean way to define it )
makes sense, but this is not possible currently
we'll add target substitution to product roadmap, thanks for reporting this use case @ilkhom | io.net
@ilkhom | io.net Are you folks using/trying out portkey at io.net for loadbalancing? that would be so cooool!
Yes, there is a new product coming soon at IO, and we are trying to use Portkey AI Gateway. couple days ago we had a demo call with Rohit too.

We have created joined slack channel with your support, going to ask questions there i guess.
Add a reply
Sign up and join the conversation on Discord