Welcome to Portkey Forum

Updated 8 months ago

Practices that helped reduce latency

Hey folks! What are some best practices you know that have actually worked to reduce the latency, especially OpenAI?
S
b
V
3 comments
Have explored this page from OpenAI? - https://platform.openai.com/docs/guides/production-best-practices/managing-rate-limits


Otherwise, I only know few simple ones: Reduce Max Tokens and Prompt Size when not needed, and generally trubo models are optimised for latency too.
Thanks Saif for sharing that link. I hadn't looked at it but had read those tips in other references. I think I am trying my best already to follow those tips as much as possible.
(Semantic) Caching is a huge one - for Q&A type use cases the cache hit rate is very high at ~50%

Would love to know from others!
Add a reply
Sign up and join the conversation on Discord