Welcome to Portkey Forum

Updated 7 months ago

Tagging @visarg who can confirm if this

Tagging @visarg who can confirm if this might be happening on Cloudflare's end. Checking our own debugger now 🧐

My hunch is, it shouldn't because we have seen multiple requests with 100+ seconds go through previously. But confirming now.
Attachment
image.png
Y
V
v
34 comments
Hey! Any updates?
@Vrushank | Portkey
@Yorick we are still investigating but are unable to pin point the reason yet.
Let me come back to you on this tomorrow
@Vrushank | Portkey

Plain Text
from portkey_ai import PORTKEY_GATEWAY_URL
from portkey_ai import Portkey

client = Portkey(
    base_url=PORTKEY_GATEWAY_URL,
    api_key="YOUR_API_KEY",
    config="ANY_ANTHROPIC_CONFIG",
)

chat_complete = client.chat.completions.create(
    model="claude-3-opus-20240229",
    messages=[
        {"role": "system", "content": "You are a chat assistant helping a user count from 1 to 10_000."},
        {"role": "user", "content": "Count from 1 to 10_000. Do not omit for brevity or use any shortcuts. It is imperative that all numbers are present."}
    ],
    max_tokens=4096
)

print(chat_complete)
print(chat_complete.choices[0].message.content)


You can reproduce the behaviour with this prompt
Hey. Thanks for the prompt. I will reproduce this on our end and get back with an answer.
@visarg Sorry to press you on this repeatedly, but I just have so many of them failing (and Anthropic does charge me)

Do you have any updates?
Attachment
image.png
This is happening on cloudflare end as the request is taking more than 100 seconds. Have you considered using streaming mode?
But I am still having some doubts. I can successfully make OpenAI request which is taking more than 2 minutes. But it always fails for anthropic.
I was able to make an OpenAI request which took 257823 ms. This was plain openai call without any addition to it. And it was not a streaming call.
@visarg maybe we can verify if we have ever seen any Anthropic requests on Portkey that were > 100s?
Yes, I am going to use streaming mode as temporary solution, but preferably it would work without
Do consider that it works when using the Anthropic SDK rather than Portkey. Because of the 100s, I also suspect Cloudflare rather than Anthropic
Additionally, I hardly find any Google results indicating other Anthropic users facing 524 on the regular
@visarg Streaming sometimes(?) does not log the request in Portkey 😦
This should not happen ideally. Are you able to reproduce it consistently?
Hmm, not entirely, but got 50% now. All of these were executed via Portkey. Not all of them got logged

Plain Text
import time

from portkey_ai import PORTKEY_GATEWAY_URL
from portkey_ai import Portkey

client = Portkey(
    base_url=PORTKEY_GATEWAY_URL,
    api_key="3RDbM3ffU1EOX50jy9nKcJiwN5A=",
    config="pc-anthro-f99722",
)

start = time.time()

chat_complete = client.chat.completions.create(
    model="claude-3-opus-20240229",
    messages=[
        {"role": "system", "content": "You are a chat assistant helping a user count from 1 to 10_000."},
        {"role": "user", "content": "Count from 1 to 10_000. Do not omit for brevity or use any shortcuts. It is imperative that all numbers are present."}
    ],
    max_tokens=4096,
    stream=True,
)

for chunk in chat_complete:
    print(chunk.choices[0].delta.content, end="", flush=True)

print(time.time() - start)
Attachments
image.png
image.png
@visarg @Vrushank | Portkey Any updates regarding these issues?
By any chance, is it possible that you are disconnecting the client before the whole stream ends?
It is a simple for-loop, I do not see any errors with it?

Plain Text
    client = get_client(model=model, prompt_name=Path(prompt_file).stem)
    completions = await client.chat.completions.create(**request.input.compile_params())
    if model is OpenAIModel.CLAUDE_3_OPUS_20240229:
        chunks = []
        async for chunk in completions:
            print(chunk.choices[0].delta.content, end="", flush=True)
            chunks.append(chunk)
        message_ = ChatCompletionMessage(
                    content="".join([chunk.choices[0].delta.content for chunk in chunks]),
                    role="assistant"
                )

        completions = ChatCompletions(
            id="None",
            choices=[
                Choice(
                    index=0,
                    finish_reason=chunks[-1].choices[0].finish_reason,
                    message=message_.dict(),
                )
            ],
            created=1,
            model="None",
            object="None",
        )
@Vrushank | Portkey @visarg Any updates? We are still facing issues
Hey. Are you still facing the streaming issue or the other issue?
We tried making a bunch of streaming calls in a loop for anthropic and all of them were getting logged properly. It has been hard to reproduce this even once.

One final thing that we can try is to run the exact script that you are running. Will it be possible to send the exact script that you are using to test streaming calls in a loop? The snippets that you have sent till now are not complete and they only contain the completions call part.
@Yorick let me know and we can set up a private channel to discuss this further, share more code!
Okay, I will try to provide a complete sample!
This is happening on cloudflare end as the request is taking more than 100 seconds. Have you considered using streaming mode?

I am using streaming, but preferably I would not. Is the other issue fixed?
@Vrushank | Portkey @visarg I found the bug in my code:

return AsyncPortkey(
api_key=get_portkey_api_key(),
config=get_portkey_config(model),
trace_id=correlation_id_var.get(),
# timeout=get_timeout(),
).with_options(metadata=metadata)

I had a timeout option in my code (legacy). If I remove it, Portkey does consistent logging using streaming
Did you guys manage to fix Anthropic non-streaming calls that are longer than 1 min?
Oh man that's something. If you still want timeout, you can configure it through Portkey

Wonder why they deprecated it πŸ€” Do you happen to have any idea why? Also, would love to understand what was your use case for the timeout feature?
@Yorick Anthropic SDK mentions the timeout feature prominently.. where did you see that it's legacy now?
@Vrushank | Portkey I used to have my internal 'Portkey' which did have a timeout. GPT-4 could hang indefinitely, that is why I wanted to stop the response. In the transition, the parameter stayed
Ah interesting. Is Portkey timeout working for you as expected now?
I am no longer using timeouts, just trusting that OpenAI delivers!
Add a reply
Sign up and join the conversation on Discord