Pricing
Cut your AI spend and get to production faster with Predibase.
Fine-tune and serve with zero headaches in our cloud or your VPC.
Calculate Your Inference Cost-Savings
Use case (tokens per request)
Requests / second
Current model
Price: $30 / 1M input tokens, $60 / 1M output tokens$15,648,058.08
estimated yearly cost savings
Based on 1 A100 replica and fine-tuned Llama-3-8B
Savings do not factor in Enterprise discounts and vary based on hardware selection. Estimations are based on the list price for an A100 80GB GPU.
For a personalized quote, reach out to us at support@predibase.com
Predibase Tiers
Developer
Get started right away fine-tuning and serving adapters that beat GPT-4
- Up to 1 user
- Pay-as-you-go pricing
- Unlimited best-in-class fine-tuning with A100 GPUs
- Inference:
- 1 private serverless deployment (no rate limits)
- Autoscaling and scale to 0
- Serve unlimited adapters on a single GPU with LoRAX
- Free shared serverless inference (with rate limits) for testing
- Access to all available base models
- Data connection via file uploads
- 2 concurrent training jobs
- In-app chat, email, and Discord support
Note: Free credits expire after 30 days.
Enterprise
Guaranteed autoscaling and priority compute access for teams ready to go into production
- Additional seats for your whole team
- Volume discounts for serving compute
- Inference:
- Guaranteed instances to ensure scaling to meet increased demand
- Additional replicas for burst usage
- Additional private serverless deployments
- Guaranteed uptime SLAs
- Data connection via Snowflake, Databricks, S3, BigQuery, and more
- Additional concurrent training jobs
- Dedicated Slack channel, plus consulting hours with our experts
Enterprise
Fine-tune and serve while guaranteeing that your data never leaves your cloud
- Deploy directly into your own cloud (AWS, Azure, GCP)
- Use your own cloud commitments
- Optimize usage with your own GPUs
- Enterprise security and compliance
Private Serverless Inference
Hardware | Base Price ($ / hr) |
---|---|
1 A10G (24GB) | $2.60 |
1 L40S (48GB) | $3.20 |
1 A100 PCle (80GB) | $4.80 |
1 H100 PCIe (80GB) | Enterprise-only |
1 H100 SXM (80GB) | Enterprise-only |
1 H200 | Enterprise-only |
1 MI300X | Enterprise-only |
Multi A100 or H100 SXM | Enterprise-only |
Shared Serverless Inference
Predibase supports state-of-the-art, efficient inference for both pre-trained and fine-tuned models enabled by LoRA Exchange (LoRAX). Serverless pricing is designed for experimentation and is free to use for up to 1M tokens per day and 10M tokens per month.
- Solar-Mini
- Solar Pro Preview
- Llama-3-1-8b-instruct
- Llama-3-1-8b
- Llama-3-70b
- Llama-3-70b-instruct
- Mistral-7b-instruct-v0.1
- Mistral-7b-instruct-v0.2
- Mistral-7b
- Gemma-2B-Instruct
- Gemma-7B-Instruct
- Code-llama-13b-instruct
Fine-tuning Costs
Predibase offers state-of-the-art fine-tuning at cost-effective prices. Expected costs vary depending on the dataset, size of the base model, and whether you're training a LoRA, Turbo LoRA, or Turbo.
Calculate Your Fine-tuning Cost
Model Size (parameters)
Dataset (tokens)
Epochs (number of iterations)
Estimated Cost
$0.00
Officially Supported Models for Fine-tuning Include:
- Solar Mini, Solar Pro Preview
- Llama 3.2 (1B, 3B; Instruct and non-Instruct)
- Llama 3.1 8B (Instruct and non-Instruct)
- Mistral-7b, Mistral-7b-instruct-v0.1 and v0.2
- Mistral Nemo 12B 2407 (Instruct and non-Instruct)
- Mixtral-8x7B-Instruct-v0.1
- Codellama 13B Instruct, Codellama 70B Instruct
- Zephyr 7B Beta
- Gemma 2 (9B, 27B; Instruct and non-Instruct)
- Phi 3.5 Mini Instruct
- Phi 3 4k Instruct
- Qwen 2.5 (1.5B, 7B, 14B, 32B; Instruct and non-Instruct)
- Qwen 2 (1.5B; Instruct and non-Instruct)
- Any OSS Model from Huggingface (best effort)
Fine-Tuning Pricing (per 1M tokens)