Constraints Of Api Returns For Streaming

At a glance

Hey folks, it looks like Deepseek via Together doesn't get cost (this is ok), but also it doesn't return tokens used in the API response while streaming (just like pplx).

This is pretty annoying, especially that I have to find it out myself. Is there any (better) way to know the constraints of API returns for streaming? Also, since it seems I have to tokenize this myself (just like pplx) any tips on a tokenizer for Deepseek?

You guys handle this somehow on your end?
"model": "",
"usage": {
"completion_tokens": 1875
}

6 comments

rrohit

Hey @Gijs - yea, in streaming use cases some vendors still aren't calculating usage. And we don't append to it as it increases latency. Would you be ok if we did something about it but increased latency by a 100ms or so?

ssega

Hey @Gijs through which platform are you using deepseek?

ssega

fireworks?

GGijs

I'm using together.ai

GGijs

I think 100ms generally would be ok to me, to be honest.

ssega

deepseel models added to together.ai

Add a reply

Welcome to Portkey Forum

Constraints Of Api Returns For Streaming