Hey folks, it looks like Deepseek via Together doesn't get cost (this is ok), but also it doesn't return tokens used in the API response while streaming (just like pplx).
This is pretty annoying, especially that I have to find it out myself. Is there any (better) way to know the constraints of API returns for streaming? Also, since it seems I have to tokenize this myself (just like pplx) any tips on a tokenizer for Deepseek?
You guys handle this somehow on your end? "model": "", "usage": { "completion_tokens": 1875 }
Hey @Gijs - yea, in streaming use cases some vendors still aren't calculating usage. And we don't append to it as it increases latency. Would you be ok if we did something about it but increased latency by a 100ms or so?