Gijs

·

Troubleshooting claude-3-7-sonnet-latest configuration

Hey friends, trying to get Claude 3.7 Sonnet to think, not working:. Any mistake in this, or Portkey limitation?

"body": { "model": "claude-3-7-sonnet-latest", "max_tokens": 8192, "thinking": { "type": "enabled", "budget_tokens": 16000 }, "stream": true, "stream_options": { "include_usage": true }, "tools": [ {tool definitions} ], "tool_choice": "auto", "messages": [ { "role": "system", "content": "You are.....

20 comments

G

p

A

s

GGijs

·

Constraints Of Api Returns For Streaming

Hey folks, it looks like Deepseek via Together doesn't get cost (this is ok), but also it doesn't return tokens used in the API response while streaming (just like pplx).

This is pretty annoying, especially that I have to find it out myself. Is there any (better) way to know the constraints of API returns for streaming? Also, since it seems I have to tokenize this myself (just like pplx) any tips on a tokenizer for Deepseek?

You guys handle this somehow on your end?
"model": "",
"usage": {
"completion_tokens": 1875
}

6 comments

s

r

G

GGijs

·

Configuring Load Balancing for Perplexity

Hey folks, I had to setup load balancing for Perplexity (good problem to have). It seems to work, I get Loadbalancer active in the dashboard. But can I please trouble you guys to double check my config?

I have a hunch that I should be able to simplify it and wouldn't need to input virt_key_1 twice, but not sure how

{
"virtual_key": "virt_key_1",
"cache": {
"mode": "semantic",
"max_age": 10000
},
"retry": {
"attempts": 5,
"on_status_codes": [
429
]
},
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"virtual_key": "virt_key_1"
},
{
"virtual_key": "virt_key_2"
}
]
}

12 comments

V

G

GGijs

·

Reducing Latency in Portkey Route Measurements

Ayo legends, happy new year almost. I'm trying to reduce latency because I think it's quite high.

I put some measurements in my portkey route, and comes down to this:

Final timing: {
auth: 455.3472920060158,
balanceCheck: 143.57929101586342,
messageProcessing: 0.00041601061820983887,
portkeyInit: 1.2397089898586273,
messagePrep: 0.03700000047683716,
toolsSetup: 0.0010839998722076416,
apiCallSetup: 4235.661958009005,
portkeyCall: 4235.673999994993,
timeToFirstToken: 4904.3970829844475,
totalTime: 14430.662541985512
}

It looks like main bottleneck is happening after calling portkey actually. Any tips? Am I messing something up in my config or something?

5 comments

V

G

GGijs

·

Chat Completion Trick Involves Passing Image Metadata

Any known issue or different trick to get user to show for image generation? For chat completion I just pass it in metadata. Tried to do the same

9 comments

V

G

GGijs

·

Implementing a Stop Button for Streaming Chat Responses

Hey folks, I'm trying to implement a 'stop' button for streaming chat responses. Portkey docs don't mention this, but the general way I read about is to include a stop signal and an abortController.

So I just tried it, but seems Portkey doesn't handle it.

Any advice for how to handle this?

19 comments

V

G

V

s

GGijs

·

Avoiding Semantic Cache Issues with Vision Messages

I'm running into an issue with vision messages that every image message hits the semantic cache. Obviously that's really bad because it will return text that is based on a different image.

How do I get around this?

2 comments

V

GGijs

·

Multimodal Capabilities of the AI Gateway

Hey folks, the portkey doc is a bit ambivalent on this: https://portkey.ai/docs/product/ai-gateway/multimodal-capabilities/vision

But I should be able to send images to GPT-4o and 4o-mini using the regular completions route, no?

8 comments

V

G

S

GGijs

·

Using Vision Effectively in Chat Conversations

I have a question about using vision properly in a chat context

In ChatGPT once you have an image in your chat conversation, you can continue to chat with it. It seems like this requires sending the image to the LLM every time (similar to sending the entire messages array in a chat convo). First of all, this will use a lot of tokens, but also, from a practical perspective, how would this be done?

Do you just keep passing all the Base64 data in a messages array like below?

messages: [
{
role: "user",
content: [
{ type: "text", text: "What’s in this image?" },
{
type: "image_url",
image_url:
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
},
],

1 comment

V

Welcome to Portkey Forum

Troubleshooting claude-3-7-sonnet-latest configuration

Constraints Of Api Returns For Streaming

Configuring Load Balancing for Perplexity

Reducing Latency in Portkey Route Measurements

Chat Completion Trick Involves Passing Image Metadata

Implementing a Stop Button for Streaming Chat Responses

Avoiding Semantic Cache Issues with Vision Messages

Multimodal Capabilities of the AI Gateway

Using Vision Effectively in Chat Conversations