There are providers like replicate.com that charge you per second of usage instead of tokens used. I was wondering, under what circumstances does it make sense to use this instead of the ordinary APIs? And can I expect more tokens per second this way than with token-based APIs?
I was once trying a chain of thought prompt that consumes about 1400 tokens every time and costed me about 5.5 cents every single time, and took about 15 seconds to get an response.
If second-based, my cost would be just 0.15 cents at max. Ofcourse, I might be missing a lot other things like what model? etc
I'd be wary - if they could give more throughput (TPS) than providers like Anyscale/Together that's something that they would lead with in their marketing. The absence of it suggests that they don't have an edge there.
Not sure of the pricing comparisons though. The thing is, haven't really seen much of anyone go to prod with Replicate, so little less on anecdotal data I think.