Welcome to Portkey Forum

Updated last month

Caching Dynamic User Inputs While Preserving System Prompt Cache

At a glance
Hi everyone! 👋

I’m working on a use case where I’m using the portkey.prompts.completions.create method to integrate AI model calls. Here’s a snippet of my code:
Plain Text
portkey = Portkey(
    api_key= get_portkey_apikey(),
    virtual_key="openai-key-...",
    config='pc-cache-...'
)

ai_balancer_v2_response = portkey.prompts.completions.create(
    prompt_id="pp-balancer-...",
    variables={
        "user_query": f"{msg_input}",
        "system_prompt": {
            "type": "text",
            "text": balancer_prompt,
            "cache_control": {"type": "cacheable"}
        }
    },
    tools=tool_definitions,
    tool_choice="required",
    parallel_tool_calls=False,
    max_tokens=1024
)

The issue I’m encountering is related to the caching mechanism. It seems that the cache is invalidated whenever I change the user_query (dynamic input). However, I need the cache to remain valid for the system prompt (balancer_prompt), even when the user input changes.

In my use case, the system prompt is consistent across requests, and I want to benefit from caching for it, but the user_query will always be different and should not affect the caching.
s
s
3 comments
openai automatically caches your prompt when input tokens are longer than 1024, you don't need to send cache_control for it
We are using openai and anthropic both. During our tests we got the same result wether we send cache control or not.

The result was that the cache will be MISS unless the combination of system prompt and user input is exactly the same among request.

As if the cache will only be used for requests on which system prompt + user input are exactly the same.

The moment we change the user input cache will be miss.

We tried under simple cache and semantic both getting the same results.

Our system prompt is 6k tokens


Any insights into this ? @sega
Hey @shockdav I think you're confusing portkey's cache with openai's cache, if your system prompt is longer than 1024 tokens, openai automatically caches it, you can see reduced cost for it.
The UI symbol for cache is portkey cache, which when enabled caches the full inference request, and when the same request is sent, cache is hit and you get response for no cost (0 cents)
Let me know if this clarifies it for you
Add a reply
Sign up and join the conversation on Discord