Welcome to Portkey Forum

Home
Members
ilkhom | io.net
i
ilkhom | io.net
Offline, last seen 4 weeks ago
Joined November 7, 2024
Hey guys, let's say i want to serve self hosted LLAMA 3.1 405B on 2 different VMs with H100s, I use vLLM + ngrok alternative btw.

I have tested Loadbalancing mode, works fine. with this config:

{ "strategy": { "mode": "loadbalance" }, "targets": [ { "provider": "openai", "custom_host": "https://llama-1.tunnels-dev.io.systems/v1", "api_key": "Bearer dummy-api-key-vm1k", "weight": 0.4, "override_params": { "model": "meta-llama/Llama-3.1-405B-FP8" } }, { "provider": "openai", "custom_host": "https://llama-2.tunnels-dev.io.systems/v1", "api_key": "Bearer dummy-api-key-vm2", "weight": 0.6, "override_params": { "model": "meta-llama/Llama-3.1-405B-FP8" } } ], "cache": { "mode": "simple", "max_age": 60000 }, "retry": { "attempts": 3, "on_status_codes": [ 404, 429, 500, 520 ] } }

But i want to achieve one more goal here, in case all retries to llama-1 (target-0) fails, i need to fallback to target-1.

basically i want a mode called loadbalance-fallback from the Docs, looks like there is such hack like this, but are there any better ways ?
7 comments
G
i