Welcome to Portkey Forum

Home
Members
Vrushank | Portkey
V
Vrushank | Portkey
Offline, last seen 3 days ago
Joined November 4, 2024
Absolutely! You can just add openai.base_url = https://api.portkey.ai/v1 and Portkey should start logging all of your Assistants/Threads/Files requests
4 comments
V
d
Hi! Checking this and getting back to you asap
5 comments
b
V
This was shared by @thismlguy - is anyone else seeing some faulty multi_tool_use.parallel calls in their OpenAI tool calling requests?

There's an ongoing thread from the last year, where multiple users are reporting the same bug
5 comments
P
V
t
Hey @hebertrfreitas, welcome!

This is interesting. We'll investigate and get back. Currently Portkey only supports the standard Azure OpenAI URLs, but I wonder if using the custom_host param we can make this work in some way.
3 comments
h
v
Would love to follow up with the error rate on Sonnet 3.5 - is it coming up better than gpt-4-turbo?
2 comments
V
K
Hi @dodgery thanks for reporting, checking this. In the meanwhile, can you please DM me your Portkey email? Will see if it's specifically failing for you for some reason
1 comment
V
I'd suggest to use some method in Go to directly make REST calls. Not sure how good the unofficial Go OpenAI SDK is.
8 comments
b
V
Possible to share the difference in outputs? That way we can judge if something went wrong in between. Also, if you can DM me your prompt template ID, that'll help as well
5 comments
V
W
v
Will update you shortly! cc @Sabbyasachi
1 comment
V
Hey @shubham welcome! Checking if you are on the latest Portkey package.. confirming that prompts.render() method should work
1 comment
V
Hey Harsh, changing the baseURL to http://localhost:8787 will work
7 comments
V
H
Hey confirming that on Pythin client.files.create() is working with Configs as well as Virtual key. Can you share the code snippet you're using? Can quickly see if something needs to be fixed there
7 comments
V
S
Tagging @visarg who can confirm if this might be happening on Cloudflare's end. Checking our own debugger now 🧐

My hunch is, it shouldn't because we have seen multiple requests with 100+ seconds go through previously. But confirming now.
34 comments
Y
v
V
Not all Llama 2's are Created Equal

Llama 2 models from different providers like Together, Anyscale, Perplexity, etc may often seem identical on paper, but the same query on the same Llama 2 model can yield different responses depending on the inference provider.

Why? It often comes down to how the provider handles the model. For instance, some might use quantization to make the model run faster and consume less resources, but this can subtly alter the quality of the output.

I recently read this blog post by Together AI about their inference engine, and importantly, their chart comparing their inference performance to the vanilla HuggingFace implementation.

Here's a snippet from their blog:
"The improvements to performance with the Together Inference Engine come without any compromise to quality. These changes do not involve techniques like quantization which can change the behavior of the model, even if in a modest way."

This got me thinking —
  • how do these subtle differences impact our work?
  • how does this affect your choice of provider for Llama 2 models?
Would love to hear your thoughts on this!
2 comments
V
S
What combo of embeddings + LLMs are you looking at?
2 comments
V
S
It shows REST option in the dropdown right :think:
4 comments
V
S
Awesome! Look forward to your thoughts on the gateway!
39 comments
R
v
Thanks for sharing! If it is messagaes you're missing, it should ideally say: Error code: 400 - 'messages' is a required property :think:

Are you also using a Config with this?
5 comments
v
r
S
V
Try putting "n" : "1" - 1 in strings maybe that's an issue
10 comments
S
V
Totally hear you. cc @visarg

If possible, can you please share how you imagine such a workflow to be? And for what use case fallback + load balance beneficial for you? I'll take that to the team! 😄
3 comments
V
d
You can do this using the ‘n’ param - it’s used same as top_p, temperature etc
3 comments
V
d
Notice GPT-4 getting faster? We did too. 🐇

Over the last 3 months, GPT-4 and GPT-3.5 latencies have more than halved - for both your regular requests and computationally complex requests with high token counts.

Check out the findings: https://blog.portkey.ai/blog/gpt-4-is-getting-faster/
3 comments
V
L
Yes, you can register multiple feedback values to the same trace id, and all of it will be visible on logs and the analytics pages
4 comments
V
d
Hey @deepanshu_11!

With the SDK it's pretty simple. Check out this notebook tailor made for your use case: https://github.com/Portkey-AI/portkey-python-sdk/blob/readme/examples/loadbalance_two_api_keys.ipynb
39 comments
d
V
v