Serving Self-hosted Llama 3.1 405B on Multiple Vms with H100s

Question

Hey guys, let's say i want to serve self hosted LLAMA 3.1 405B on 2 different VMs with H100s, I use vLLM + ngrok alternative btw.I have tested Loadbalancing mode, works fine. with this config:{ "strategy": { "mode": "loadbalance" }, "targets": [ { "provider": "openai", "custom_host": "https://llama-1.tunnels-dev.io.systems/v1", "api_key": "Bearer dummy-api-key-vm1k", "weight": 0.4, "override_params": { "model": "meta-llama/Llama-3.1-405B-FP8" } }, { "provider": "openai", "custom_host": "https://llama-2.tunnels-dev.io.systems/v1", "api_key": "Bearer dummy-api-key-vm2", "weight": 0.6, "override_params": { "model": "meta-llama/Llama-3.1-405B-FP8" } } ], "cache": { "mode": "simple", "max_age": 60000 }, "retry": { "attempts": 3, "on_status_codes": [ 404, 429, 500, 520 ] }
}But i want to achieve one more goal here, in case all retries to llama-1 (target-0) fails, i need to fallback to target-1. basically i want a mode called loadbalance-fallback from the Docs, looks like there is such hack like this, but are there any better ways ?

Snape | Portkey · Answer

the example in the docs is really the most clear way to configure it, you can simply nest targets

Snape | Portkey · Answer

But I get that its verbose to write the same target twice

ilkhom | io.net · Answer

but let's say i have 10 urls, that i want to load balance, and fallback to others incase it fails.

in this case i need to create 10*10 targets in the configs, looks like not the most clean way to define it )

Snape | Portkey · Answer

makes sense, but this is not possible currently

Snape | Portkey · Answer

we'll add target substitution to product roadmap, thanks for reporting this use case @ilkhom | io.net

Snape | Portkey · Answer

@ilkhom | io.net Are you folks using/trying out portkey at io.net for loadbalancing? that would be so cooool!

ilkhom | io.net · Answer

Yes, there is a new product coming soon at IO, and we are trying to use Portkey AI Gateway. couple days ago we had a demo call with Rohit too.

We have created joined slack channel with your support, going to ask questions there i guess.

Welcome to Portkey Forum

Serving Self-hosted Llama 3.1 405B on Multiple Vms with H100s