Serverless Inference with Hugging Face and NVIDIA NIMs

enterprise hub deets:

- uses NVIDIA DGX Cloud and NIMs.
- models: includes 7 open LLMs, includes llama 3.1 70B and mixtral 8x22B.
- pricing: $0.0023 per second per GPU, pay-as-you-go.
- supports OpenAI API standards.

Serverless Inference with Hugging Face and NVIDIA NIMs

enterprise hub deets:

- uses NVIDIA DGX Cloud and NIMs.
- models: includes 7 open LLMs, includes llama 3.1 70B and mixtral 8x22B.
- pricing: $0.0023 per second per GPU, pay-as-you-go.
- supports OpenAI API standards.

https://huggingface.co/blog/inference-dgx-cloud