Predibase Cloud Pricing

Pricing listed below is for the Developer Tier of Predibase AI Cloud. After your free trial, you’ll be able to upgrade by adding your credit card.

If you’re interested in learning about our Enterprise / VPC pricing, please contact us at support@predibase.com.

Serverless Inference Endpoints

Same price for both pre-trained and fine-tuned models

Predibase supports state-of-the-art, efficient inference for both pre-trained and fine-tuned models at the same, flat per-token price. It is enabled by LoRA Exchange (LoRAX), Predibase’s unique technology that allows us to have the most cost-effective fine-tuned model serving in the market.

For comparison, OpenAI GPT-3.5 charges 8x more for inference on their fine-tuned models than the base model. And most other OSS LLM infrastructure companies don’t give you the option, forcing you to use an expensive $ / GPU-hour pricing model for fine-tuned models.

Models supported include:

  • Gemma-2b (new!)
  • Gemma-7b (new!)
  • Code-llama-13b-instruct
  • Code-llama-34b
  • Llama-2-7b
  • Llama-2-7b-chat
  • Llama-2-13b
  • Llama-2-13b-chat
  • Llama-2-70b
  • Llama-2-70b-chat
  • Mistral-7b-instruct
  • Mistral-7b
  • Mixtral-8x7B-Instruct-v0.1
  • Yarn-Mistral-7b-128k
  • Phi-2
  • Zephyr-7b-beta

Model Size (Pre-trained and Fine-tuned Models)

Price per 1M tokens (input + output)

Up to 7B

$0.2

Up to 13B

$0.25

Up to 70B

$1.0

Mixtral-8x7B

$1.0

Note: We are continuously adding support for inference for pre-trained models. If there are other models you’d like to see hosted, please reach out to support@predibase.com.

Fine-tuning Costs

Predibase offers state-of-the-art fine-tuning at cost-effective prices. Expected costs vary depending on the dataset and the size of the model being fine-tuned.

Calculate Your Fine-tuning Cost

Model Size (parameters)

Dataset (tokens)

Epochs (number of iterations)

Estimated Cost

$0.00

Officially Supported Models for Fine-tuning

  • Llama-2-7b, Llama-2-7b-chat
  • Llama-2-13b, Llama-2-13b-chat
  • Llama-2-70b, Llama-2-70b-chat
  • Codellama-13b-instruct
  • Mistral-7b, Mistral-7b-instruct
  • Yarn-mistral-7b-128k
  • Mixtral-8x7B-Instruct-v0.1
  • Zephyr-7b-beta
  • Any OSS Model from Huggingface (best effort)

Fine-Tuning Pricing (per 1M tokens)

Up to 7B
$0.36
7.1B - 21B
$0.65
21.1B - 34B
$2.10
34.1B - 70B
$3.21
Mixtral-8x7B
$3.21

Dedicated Deployments

Serve individual fine-tuned LLMs or one base model with 100+ fine-tuned adapters

If your use-case or organization requires it, you also have the option to spin up a private, dedicated deployment for your fine-tuned model where you will be billed on a $/gpu-hour basis. Dedicated deployments, like Predibase's shared deployments, are built on top of LoRA Exchange (LoRAX), allowing you to serve multiple fine-tuned models in your dedicated deployment at no additional cost.

Hardware

Price Per Hour of Serving

Model Size

1 A10G (Developer Tier)

$1.82

Up to 7B

0.25 A100 (Enterprise Tier only)

$0.74

Up to 7B

0.5 A100 (Enterprise Tier only)

$1.49

7.1B - 17B

1 A100 (Enterprise Tier only)

$2.97

17.1B - 35B

2 A100 (Enterprise Tier only)

$5.94

35.1B - 70B

For A100 deployments, contact support@predibase.com to discuss our options.

Predibase Tiers

Developer

  • Predibase AI Cloud (SaaS)
  • Pricing: Pay-as-you-go
  • Fine-tune LLMs
    • Model size: up to 70B parameters
    • Hardware: Access to A100 GPUs
  • Inference
    • Serverless inference: Token-based pricing for base model and fine-tuned model
    • Dedicated deployments: Limited to one 7B parameter base model with multiple fine-tuned adapters.
  • Performance: 1 request per second
  • Concurrent training jobs: Limited to 2
  • Customer support: Chat and email
Get Started with $25 Free Credits

Note: Free credits expire after 30 days.

Enterprise

  • Predibase AI Cloud (SaaS)
  • Pricing: Discounted pay-as-you-go
  • Fine-tune LLMs
    • Model size: up to 70B parameters
    • Hardware: Priority access to A100 GPUs
  • Inference
    • Serverless inference: Token-based pricing for base model and fine-tuned model
    • Dedicated Deployments: Unlimited deployments for all model sizes. Serve multiple fine-tuned variants on a single base model with LoRAX.
  • Performance: 100 requests per second
  • Concurrent training jobs: Can be increased upon request
  • Customer support: Predibase's Customer Success program includes a dedicated support channel, training, and priority support from our ML experts.

Ready to accelerate your model building?