GroqCloud serves open-weights models (Llama, DeepSeek, Qwen, Kimi) on Groq's purpose-built LPU hardware, hitting hundreds of tokens per second where GPUs manage tens. OpenAI-compatible API with a free tier; the default when token latency is the product.
Inference · Groq
Groq
Ultra-fast inference on custom LPU chips. Open-weights at 500+ tokens/sec.
Model support
Multi-model
- Llama
- DeepSeek
- Qwen
- Kimi K2
- GPT-OSS
Open-weights catalog on Groq LPUs; OpenAI-compatible API.
Where it runs
- API
- Web
Tags
- #inference
- #low-latency
- #lpu
- #open-weights
Related in Inference
View Together AI details InferenceFREEMIUMVettedTogether AI
Together
Fine-tuning + inference for open-weights models. Broad coverage.
Hosted inference and fine-tuning across hundreds of open-weights models (Llama, Mistral, DeepSeek, Qwen, etc.). Strong pricing for inference-at-scale; LoRA + full fine-tuning supported.
- inference
- fine-tuning
- open-weights
- lora
View OpenRouter details InferenceFREEMIUMOpenRouter
OpenRouter
One OpenAI-compatible API in front of 300+ models from every provider.
A unified gateway that routes a single endpoint and API key to models from Anthropic, OpenAI, Google, Meta, DeepSeek, xAI, and more — swap models by changing one parameter, with automatic fallbacks and one consolidated bill. Pass-through token pricing plus dozens of free models.
- gateway
- routing
- multi-model
- fallbacks
View Replicate details InferenceFREEMIUMReplicate
Replicate
Run, fine-tune, and deploy thousands of open models via one API.
A platform to run open-source models with one API call — image, video, audio, and language — plus fine-tuning and custom deploys with pay-per-second billing. No infra to manage.
- model-hosting
- fine-tuning
- api
- open-source
View Fireworks AI details InferenceFREEMIUMFireworks AI
Fireworks AI
Fast inference + fine-tuning. Production deployments at scale.
Optimized inference platform for open-weights models with strong latency numbers and serverless + dedicated deployment options. Fine-tuning supported; vision and audio models alongside text.
- inference
- fine-tuning
- low-latency
- production