Welcome to Portkey Forum

Updated 8 months ago

Comparing the Error Rates of Sonnet 3.5 and GPT-4-Turbo

At a glance

Would love to follow up with the error rate on Sonnet 3.5 - is it coming up better than gpt-4-turbo?

2 comments

Again super early and anecdotal results, but so far yes. I'm running an experiment today where I'm swapping all of my "chain of thought" calls to just use Sonnet 3.5 and so far I'm seeing excellent and consistent results. I'm using it for coding, so it may not perform as well in other domains. But so far my vibes-based evals are passing with flying colors.

VVrushank | Portkey

Wow that's amazing. Yeah I'd think Claude to do particularly well in chain of thought calls - Anthropic seems to be embedding that functionality more and more in Claude. Saw that with the Golden Gate Bridge Claude, and now with Claude doing COT thinking while tool calling

Add a reply