Predibase Cloud Pricing

Pricing listed below is for the Developer Tier of Predibase AI Cloud. After your free trial you’ll be able to upgrade by adding your credit card.

If you’re interested in learning about our Enterprise / VPC pricing, please contact us at support@predibase.com.

Serverless Inference Endpoints

All serverless pricing is free to use and limited to 1M tokens per day and 10M tokens per month. To exceed these rate limits for production use, please see our dedicated deployments which offer greater predictability and stability.

Predibase supports state-of-the-art, efficient inference for both pre-trained and fine-tuned models enabled by LoRA Exchange (LoRAX), Predibase’s unique technology that allows us to have the most cost-effective fine-tuned model serving in the market. For comparison, OpenAI GPT-3.5 charges 8x more for inference on their fine-tuned models than the base model.

See full list of supported models.

  • Llama-3-8b
  • Llama-3-8b-instruct
  • Llama-3-70b
  • Llama-3-70b-instruct
  • Mistral-7b-instruct-v0.1
  • Mistral-7b-instruct-v0.2
  • Mistral-7b
  • Gemma-2b
  • Gemma-2B-Instruct
  • Gemma-7b
  • Gemma-7B-Instruct
  • Code-llama-13b-instruct
  • Code-llama-70b-instruct
  • Llama-2-7b
  • Llama-2-7b-chat
  • Llama-2-13b
  • Llama-2-13b-chat
  • Llama-2-70b
  • Llama-2-70b-chat
  • Mixtral-8x7B-Instruct-v0.1
  • Phi-2
  • Zephyr-7b-beta

Note: We are continuously adding support for inference for pre-trained models. If there are other models you’d like to see hosted, please let us know.

Price

Free

Daily token limit (input + output)

1 Million

Monthly token limit (input + output)

10 million

Fine-tuning Costs

Predibase offers state-of-the-art fine-tuning at cost-effective prices. Expected costs vary depending on the dataset and the size of the model being fine-tuned.

Calculate Your Fine-tuning Cost

Model Size (parameters)

Dataset (tokens)

Epochs (number of iterations)

Estimated Cost

$0.00

Officially Supported Models for Fine-tuning

  • Llama-3-8b, Llama-3-8b-instruct
  • Llama-3-70b, Llama-3-70b-instruct
  • Llama-2-7b, Llama-2-7b-chat
  • Llama-2-13b, Llama-2-13b-chat
  • Llama-2-70b, Llama-2-70b-chat
  • Mistral-7b, Mistral-7b-instruct-v0.1 and v0.2
  • Mixtral-8x7B-Instruct-v0.1
  • Codellama-13b-instruct, Codellama-70b-instruct
  • Zephyr-7b-beta
  • Gemma-7b, Gemma-7b-instruct
  • Gemma-2b, Gemma-2b-instruct
  • Any OSS Model from Huggingface (best effort)

Fine-Tuning Pricing (per 1M tokens)

Up to 7B
$0.36
7.1B - 21B
$0.65
21.1B - 34B
$2.10
34.1B - 70B
$3.21
Mixtral-8x7B
$3.21

Dedicated Deployments

✨ (New) Introducing self-serve A100 deployments in the Developer tier! Try deploying using an A100 without needing to talk to sales.

Serve individual fine-tuned LLMs or one base model with 100+ fine-tuned adapters

We offer usage-based pricing that's billed by the second. Enjoy exceptional inference performance, seamless autoscaling, and the ability to serve unlimited fine-tuned adapters on a single deployment to maximize your usage of GPUs

Hardware

Developer Tier - $/hour

Enterprise Tier - $/hour

Model Size

1 A10G 24GB

$1.82

Discounts available

Up to 7B

1 L40S 48GB

$2.80

Discounts available

Up to 13B

1 A100 80GB

$3.90

Discounts available

Up to 70B

2 A100 160GB

$7.80

Discounts available

Up to 70B

For L40S deployments, contact support@predibase.com to discuss.

Predibase Tiers

Developer

  • Predibase AI Cloud (SaaS)
  • Pricing: Pay-as-you-go
  • Fine-tune LLMs
    • Up to 70B parameters
    • Access to A100 GPUs
  • Inference
    • Serverless inference: Free, limited to 1M tokens/day, 10M tokens/month
    • Dedicated deployments: Limited to 1 deployment. Serve unlimited fine-tuned adapters on a single base model with LoRAX. No rate limiting.
  • Concurrent training jobs: Limited to 2
  • Customer support: Discord, chat and email
Get Started with $25 Free Credits

Note: Free credits expire after 30 days.

Enterprise

  • Predibase AI Cloud (SaaS)
  • Pricing: Discounted pay-as-you-go
  • Fine-tune LLMs
    • Up to 70B parameters
    • Priority access to A100 GPUs
  • Inference
    • Serverless inference: Free, limited to 1M tokens/day, 10M tokens/month
    • Dedicated Deployments: Unlimited deployments for all model sizes. Serve unlimited fine-tuned adapters on a single base model with LoRAX. No rate limiting.
  • Concurrent training jobs: Can be increased upon request
  • Customer support: Predibase's Customer Success program includes a dedicated support channel, training, and priority support from our ML experts.

Virtual Private Cloud (VPC)

  • VPC on AWS, GCP, or Azure
  • Pricing: Platform fee (compute cost paid directly to your cloud provider)
  • Fine-tune LLMs
    • Up to 70B parameters
    • Use GPUs in your own cloud
  • Inference
    • Dedicated Deployments: Unlimited deployments for all model sizes. Serve unlimited fine-tuned adapters on a single base model with LoRAX. No rate limiting.
  • Concurrent training jobs: Unlimited
  • Customer support: Predibase's Customer Success program includes a dedicated support channel, training, and priority support from our ML experts.

Ready to efficiently fine-tune and serve your own LLM?