Unified AI Inference Gateway: Addressing Reliability Challenges

Question

Is there plan to support huggingface inference endpoints ? Ideally would like to have all AI configs in a single gateway.
Problem: Dedicated Inference endpoints, especially ones using Nvidia GPU instances that scale to zero often go down in prod, and warrant a need for a fallback option.

sega · Answer

dedicated inference endpoints are already supported in the gateway

sega · Answer

checkout https://portkey.ai/docs/integrations/llms/huggingface#hugging-face

sega · Answer

you have to pass teh x-portkey-huggingface-base-url header for dedicated hosts

Arya · Answer

I have cross encoder marco model deployed which doesn't use the openai schema, so not sure it would work in that case.

sega · Answer

yes, you can send arbitrary json payloads to endpoints that are not in the unified endpoints list, let me share an example

sega · Answer

from portkey_ai import Portkey portkey = Portkey( api_key="", provider="huggingface", Authorization="Bearer hf_", huggingface_base_url="https://rpld3pbvx.us-east-1.aws.endpoints.huggingface.cloud" # content_type="multipart/form-data"
) response = portkey.post( url="endpoints/PortkeyGuardrails-gibberish/invocations", inputs="asdasdasdasdasdsdas"
) print(response)

Welcome to Portkey Forum

Unified AI Inference Gateway: Addressing Reliability Challenges