Skip to main content
Hugging Face provides hosted inference for thousands of open-source models. Use it to access any model on the HF Hub without running it locally.

Setup

Get your token from Hugging Face Settings.
export HF_TOKEN=hf_...

Config

{
  "providers": {
    "huggingface": "${HF_TOKEN}"
  }
}

Use it

{
  "agents": [
    { "name": "coder", "model": "huggingface:meta-llama/Llama-3.1-70B-Instruct" }
  ]
}
The hf: prefix auto-infers to the huggingface provider:
{
  "model": "hf:meta-llama/Llama-3.1-70B-Instruct"
}

Features

FeatureSupported
StreamingYes
Tool useModel-dependent
Vision (images)Model-dependent

Provider Details

Provider IDhuggingface
Env variableHF_TOKEN
API typeHugging Face Inference
Auto-infer prefixhf:

Notes

  • Model IDs use the HuggingFace format: org/model-name.
  • Not all models on HF Hub support the Inference API. Check the model card for availability.
  • Free tier has rate limits. For production use, consider a HF Pro subscription or Inference Endpoints.