Welcome to Portkey Forum

A
Arya
Offline, last seen 13 hours ago
Joined February 26, 2025
Extremely odd issue, My service got hung up for 10 minutes calling OpenAI via the gateway where the model specific timeout is set to 2 min:
Portkey Logs have a gap of 10 mins:
trace_id: c1d2e8a6-8e0c-4962-b9c2-322bcb6c1c38 (original)
trace_id: 5c3d907b-17d0-4631-bdfd-720684f4f61c (cached)

the second request that portkey logs show after 10 mins hits the cache, as it exactly the same, I would assume this is result of a retry.

the metadata on logs show the first request 5130 ms though.

However when I look into my tracing application, it confirmed what I encountered, and the gap is exactly when the second request shows up in the portkey logs and hits the cache.

I would assume the call should have timed out.
2 comments
s
A
if I am sending a new seed everytime would it cause my request miss cache ? trying to figure out why cache miss is happening for the same request.
6 comments
A
s
Want to understand the fallback config, I simulated request timeout, where I get:

Plain Text
{
  "status": 408,
  "headers": {
    "content-type": "application/json",
    "x-portkey-cache-status": "MISS",
    "x-portkey-last-used-option-index": "config.targets[0]",
    "x-portkey-provider": "openai",
    "x-portkey-retry-attempt-count": "0",
    "x-portkey-trace-id": "9ac0fc87-562c-4b42-92e6-ad3cdb100880"
  },
  "body": {
    "error": {
      "message": "Request exceeded the timeout sent in the request: 12ms",
      "type": "timeout_error",
      "param": null,
      "code": null
    }
  },
  "responseTime": 1851,
  "lastUsedOptionJsonPath": "config.targets[0]"
}

My config does not include 408 in it on_status_codes list , yet the gateway fallbacks and uses the second model. What am I missing ?

Also, if I have on_status code under strategy and then nested within the target what gets preference ?
4 comments
v
A
Also when I added retry to the nested object as follows:

Plain Text
{
    "strategy": {
        "mode": "fallback",
        "on_status_codes": [
            401,
            500,
            503,
            520,
            524
        ]
    },
    "request_timeout": 360000,
    "targets": [
        {
            "virtual_key": "open-ai-virtual-xxxx",
            "override_params": {
                "model": "gpt-4o-2024-08-06"
            },
            "request_timeout": 12,
            "retry": {
                "attempts": 1,
                "on_status_codes": [
                    429,
                    408
                ]
            }
        },
        {
            "virtual_key": "anthropic-api-k-xxxx",
            "override_params": {
                "model": "claude-3-7-sonnet-20250219"
            },
            "request_timeout": 120000
        },
        {
            "virtual_key": "anthropic-api-k-xxxx",
            "override_params": {
                "model": "claude-3-5-sonnet-20241022"
            },
            "request_timeout": 120000
        }
    ],
    "cache": {
        "mode": "simple",
        "max_age": 6
    }
}
....

also, when I play with the retry number, I definitely observe setting it to 3 takes long for it to fallback to anthropic, but the portkey UI only shows one log for gpt-4 and one for claude, there is on information available on retries.
2 comments
v
A
@Team Portkey anything on the connection issue I flagged above? its blocker for us to adopt portkey and deploy to production. If nothing, the alternative unfortunately would be to use some other gateway.
6 comments
s
A
Here is the full traceback:
Plain Text
2025-02-27 08:09:14.751
stream=stdout
2025-02-27 13:09:14.737 | ERROR | ai_core.llms.open_ai_wrapper:generate_text_response_async:436 | [Portkey Gateway] Unexpected error while calling Portkey Gateway with config pc-opeai-5393d5: 'ConnectTimeout' object has no attribute 'response'
2025-02-27 08:09:14.730
stream=stdout
AttributeError: 'ConnectTimeout' object has no attribute 'response'
2025-02-27 08:09:14.730
stream=stdout
└ 1
2025-02-27 08:09:14.730
stream=stdout
│ └
2025-02-27 08:09:14.730
stream=stdout
│ │ └
2025-02-27 08:09:14.730
stream=stdout
if remaining_retries > 0 and self._should_retry(err.response):
2025-02-27 08:09:14.730
stream=stdout
File "/app/.venv/lib/python3.12/site-packages/portkey_ai/_vendor/openai/_base_client.py", line 1601, in _request
2025-02-27 08:09:14.730
stream=stdout
└
2025-02-27 08:09:14.730
stream=stdout
│ └
2025-02-27 08:09:14.730
stream=stdout
return await self._request(
2025-02-27 08:09:14.730
stream=stdout
File "/app/.venv/lib/python3.12/site-packages/portkey_ai/_vendor/openai/_base_client.py", line 1554, in request
2025-02-27 08:09:14.730
stream=stdout
└
2025-02-27 08:09:14.730
stream=stdout
│ └
2025-02-27 08:09:14.730
stream=stdout
│ │ └
2025-02-27 08:09:14.730
stream=stdout
│ │ │ └ FinalRequestOptions(method='post', url='/chat/completions', params={}, headers={'X-Stainless-Raw-Response': 'true'}, max_retr...
2025-02-27 08:09:14.730
stream=stdout
│ │ │ │ └ False
2025-02-27 08:09:14.730
stream=stdout
│ │ │ │ │ └ openai.AsyncStream[portkey_ai._vendor.openai.types.chat.chat_completion_chunk.ChatCompletionChunk]
2025-02-27 08:09:14.730
stream=stdout
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
2025-02-27 08:09:14.730
stream=stdout
File "/app/.venv/lib/python3.12/site-packages/portkey_ai/_vendor/openai/_base_client.py", line 1860, in post
15 comments
s
A
Is there plan to support huggingface inference endpoints ? Ideally would like to have all AI configs in a single gateway.
Problem: Dedicated Inference endpoints, especially ones using Nvidia GPU instances that scale to zero often go down in prod, and warrant a need for a fallback option.
6 comments
A
s
A
Arya
·

Mock Fallback

any idea how i can mock fallback ? (without messing around by providing wrong api key)
2 comments
v
A
I was able to get it to work via following schema:
"virtual_key": "xxxxxxxxxxx",
"override_params": {
"model": "anthropic.claude-3-7-sonnet-20250219-v1:0"
}

but when the response is returned from the server model field is empty.

e.g.
Plain Text
{
    "id": "1740523845732",
    "choices": [
        {
            "finish_reason": "max_tokens",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "{\n    \"description\": \"A Portkey is a magical object in the Harry Potter universe that has been enchanted to instantly transport anyone who touches it to a specific predetermined destination. It can be any ordinary object (like an old boot, newspaper, or bottle) that has been spelled to transport wizards an",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null,
                "refusal": null,
                "audio": null
            }
        }
    ],
    "created": 1740523845,
    "model": "",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 25,
        "completion_tokens": 64,
        "total_tokens": 89,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    },
    "service_tier": null,
    "provider": "bedrock"
}


so I am unsure how will know which model among the multiple options of fallback was called.
1 comment
s