Log in
Log into community
Welcome to Portkey Forum
New post
View all posts
Related posts
Did this answer your question?
😞
😐
😃
Powered by
Hall
Inactive
Updated 8 months ago
0
Follow
Practices that helped reduce latency
Practices that helped reduce latency
Inactive
0
Follow
b
bhuvansingla
8 months ago
·
Hey folks! What are some best practices you know that have actually worked to reduce the latency, especially OpenAI?
S
b
V
3 comments
Share
Open in Discord
S
Saif
8 months ago
Have explored this page from OpenAI? -
https://platform.openai.com/docs/guides/production-best-practices/managing-rate-limits
Otherwise, I only know few simple ones: Reduce Max Tokens and Prompt Size when not needed, and generally trubo models are optimised for latency too.
b
bhuvansingla
8 months ago
Thanks Saif for sharing that link. I hadn't looked at it but had read those tips in other references. I think I am trying my best already to follow those tips as much as possible.
V
Vrushank | Portkey
8 months ago
(Semantic) Caching is a huge one - for Q&A type use cases the cache hit rate is very high at ~50%
Would love to know from others!
Add a reply
Sign up and join the conversation on Discord
Join on Discord