Welcome to Portkey Forum

Updated 3 months ago

Langchain ChatOpenAI errors out with cached responses while streaming

At a glance
openai stream does not send token count
d
G
V
27 comments
@sega does that mean I can't cache with streaming?
cache is based on request
but this code is giving the error

Plain Text
config = {
    "cache": {
        "mode": "semantic",
    },
    "retry" : {
        "attempts": 3
    },
}


portkey_headers = createHeaders(api_key= os.getenv("PORTKEY_API_KEY"),
                                provider="openai",
                                metadata={"_user": "m"},
                                config=config
                                )


def init_openai_chat(temperature):
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    return ChatOpenAI(
        openai_api_key=OPENAI_API_KEY, streaming=True, temperature=temperature, model='gpt-4o',
        base_url=PORTKEY_GATEWAY_URL, default_headers=portkey_headers
    )


Error:

Plain Text
    total_tokens = oai_token_usage.get("total_tokens", input_tokens + output_tokens)
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

@sega any solutions for this?
I'm able to replicate this
yes this is bit urgent @sega lemme know once fixed, by when do you think you can roll the update?
I'm able to replicate the issue, I'm still trying to understand what's causing it
Hey @deepanshu_11 just add this option stream_usage=True when initializing the client. We'll make this the default behaviour in the opensource gateway
Plain Text
...
    return ChatOpenAI(
        openai_api_key=OPENAI_API_KEY, streaming=True, temperature=temperature, model='gpt-3.5-turbo', default_headers=portkey_headers, base_url=PORTKEY_GATEWAY_URL, max_tokens=10,
        stream_usage=True
    )
...
Let me know, if this doesn't fix the issue
Langchains ChatOpenAI errors out with cached responses
Langchain ChatOpenAI errors out with cached responses
Langchain ChatOpenAI errors out with cached responses while streaming
Also @sega since the documents are very long, its resulting into "openai error: This model's maximum context length is 128000 tokens. "
Any solution we can use for this?
you can chunk your input or use a model with longer context
okay would fallback mechanism wont be helpful here, right?
no, it wouldn't handle chunking for you, it would retry the request if it fails
@sega it looks with stream_usage=True, its not caching at all
I can see it caching, are you on a paid trial or a free use account?
would it work for both cache mode - simple and semantic?
Yes I'm on a pro account
@deepanshu_11 just checking if you were able to get stream + caching to work
Yes it worked after the fix
Thanks team πŸ™‚
Add a reply
Sign up and join the conversation on Discord