Pricing

Cut your AI spend and get to production faster with Predibase.

Fine-tune and serve with zero headaches in our cloud or your VPC.

Calculate Your Inference Cost-Savings

Use case (tokens per request)

Requests / second

Current model

Price: $30 / 1M input tokens, $60 / 1M output tokens

$15,648,058.08

estimated yearly cost savings

Based on 1 A100 replica and fine-tuned Llama-3-8B

Savings do not factor in Enterprise discounts and vary based on hardware selection. Estimations are based on the list price for an A100 80GB GPU.
For a personalized quote, reach out to us at support@predibase.com

Predibase Tiers

Predibase AI Cloud

Developer

Get started right away fine-tuning and serving adapters that beat GPT-4

Included in Developer:
  • Up to 1 user
  • Pay-as-you-go pricing
  • Unlimited best-in-class fine-tuning with A100 GPUs
  • Inference:
    • 1 private serverless deployment (no rate limits)
    • Autoscaling and scale to 0
    • Serve unlimited adapters on a single GPU with LoRAX
    • Free shared serverless inference (with rate limits) for testing
  • Access to all available base models
  • Data connection via file uploads
  • 2 concurrent training jobs
  • In-app chat, email, and Discord support
Get Started with $25 Free Credits

Note: Free credits expire after 30 days.

SaaS

Enterprise

Guaranteed autoscaling and priority compute access for teams ready to go into production

Everything in Developer, plus:
  • Additional seats for your whole team
  • Volume discounts for serving compute
  • Inference:
    • Guaranteed instances to ensure scaling to meet increased demand
    • Additional replicas for burst usage
    • Additional private serverless deployments
  • Guaranteed uptime SLAs
  • Data connection via Snowflake, Databricks, S3, BigQuery, and more
  • Additional concurrent training jobs
  • Dedicated Slack channel, plus consulting hours with our experts
Your VPC
VPC

Enterprise

Fine-tune and serve while guaranteeing that your data never leaves your cloud

Like Enterprise Saas, plus:
  • Deploy directly into your own cloud (AWS, Azure, GCP)
  • Use your own cloud commitments
  • Optimize usage with your own GPUs
  • Enterprise security and compliance
Use committed cloud spend on Predibase
If you have committed spend with AWS, Azure, or GCP, you will soon be able to use that commit on Predibase.Learn More
Do you offer discounts?
Yes, discounted pricing on compute is available for Enterprise customers. Please contact us to learn more.Learn More

Private Serverless Inference

We offer usage-based pricing billed by the second so you can configure your deployments to scale to 0 when idle or add additional replicas when usage spikes. Enjoy exceptional inference performance and the ability to serve unlimited fine-tuned adapters on a single deployment to maximize your GPU utilization and cost-effectiveness.
HardwareBase Price ($ / hr)
1 A10G (24GB)$2.60
1 L40S (48GB)$3.20
1 A100 PCle (80GB)$4.80
1 H100 PCIe (80GB)$7.78 (Enterprise-only)
1 H100 SXM (80GB)Enterprise-only
1 H200Enterprise-only
1 MI300XEnterprise-only
Multi A100 or H100 SXMEnterprise-only

Shared Serverless Inference

Predibase supports state-of-the-art, efficient inference for both pre-trained and fine-tuned models enabled by LoRA Exchange (LoRAX). Serverless pricing is designed for experimentation and is free to use for up to 1M tokens per day and 10M tokens per month.

  • Solar-Mini
  • Solar Pro Preview
  • Llama-3-1-8b-instruct
  • Llama-3-1-8b
  • Llama-3-70b
  • Llama-3-70b-instruct
  • Mistral-7b-instruct-v0.1
  • Mistral-7b-instruct-v0.2
  • Mistral-7b
  • Gemma-2B-Instruct
  • Gemma-7B-Instruct
  • Code-llama-13b-instruct

Fine-tuning Costs

Predibase offers state-of-the-art fine-tuning at cost-effective prices. Expected costs vary depending on the dataset and the size of the model being fine-tuned.

Calculate Your Fine-tuning Cost

Model Size (parameters)

Dataset (tokens)

Epochs (number of iterations)

Estimated Cost

$0.00

Officially Supported Models for Fine-tuning

  • Solar Mini
  • Solar Pro Preview
  • Llama-3.1-8b, Llama-3.1-8b-instruct
  • Llama-3-8b, Llama-3-8b-instruct
  • Llama-3-70b, Llama-3-70b-instruct
  • Llama-2-7b, Llama-2-7b-chat
  • Llama-2-13b, Llama-2-13b-chat
  • Llama-2-70b, Llama-2-70b-chat
  • Mistral-7b, Mistral-7b-instruct-v0.1 and v0.2
  • Mixtral-8x7B-Instruct-v0.1
  • Codellama-13b-instruct, Codellama-70b-instruct
  • Zephyr-7b-beta
  • Gemma-7b, Gemma-7b-instruct
  • Gemma-2b, Gemma-2b-instruct
  • Phi 3 4k Instruct
  • Phi 2
  • Gemma 2 9B, Gemma 2 9B Instruct
  • Gemma 2 27B, Gemma 2 27B Instruct
  • Qwen2 1.5B Instruct
  • Qwen2 7B, Qwen2 7B Instruct
  • Qwen2 72B, Qwen2 72B Instruct
  • Any OSS Model from Huggingface (best effort)

Fine-Tuning Pricing (per 1M tokens)

Up to 16B
$0.50
16.1 to 80B
$3.00
Up to 16B - Turbo LoRA
$1.00
16.1 to 80B - Turbo LoRA
$6.00

Ready to efficiently fine-tune and serve your own LLM?