Welcome to Portkey Forum

Updated 11 months ago

Practices that helped reduce latency

At a glance

Hey folks! What are some best practices you know that have actually worked to reduce the latency, especially OpenAI?

3 comments

Have explored this page from OpenAI? - https://platform.openai.com/docs/guides/production-best-practices/managing-rate-limits

Otherwise, I only know few simple ones: Reduce Max Tokens and Prompt Size when not needed, and generally trubo models are optimised for latency too.

bbhuvansingla

Thanks Saif for sharing that link. I hadn't looked at it but had read those tips in other references. I think I am trying my best already to follow those tips as much as possible.

VVrushank | Portkey

(Semantic) Caching is a huge one - for Q&A type use cases the cache hit rate is very high at ~50%

Would love to know from others!

Add a reply