Predibase Cloud Pricing

Pricing listed below is for the Developer Tier of Predibase AI Cloud. After your free trial you’ll be able to upgrade by adding your credit card.

If you’re interested in learning about our Enterprise / VPC pricing, please contact us at support@predibase.com.

Serverless Inference Endpoints

Same price for both pre-trained and fine-tuned models

Predibase supports state-of-the-art, efficient inference for both pre-trained and fine-tuned models at the same, flat per-token price. It is enabled by LoRA Exchange (LoRAX), Predibase’s unique technology that allows us to have the most cost-effective fine-tuned model serving in the market.

For comparison, OpenAI GPT-3.5 charges 8x more for inference on their fine-tuned models than the base model. And most other OSS LLM infrastructure companies don’t give you the option, forcing you to use an expensive $ / GPU-hour pricing model for fine-tuned models.

  • Llama-3-8b
  • Llama-3-8b-instruct
  • Llama-3-70b
  • Llama-3-70b-instruct
  • Mistral-7b-instruct-v0.1
  • Mistral-7b-instruct-v0.2
  • Mistral-7b
  • Gemma-2b
  • Gemma-2B-Instruct
  • Gemma-7b
  • Gemma-7B-Instruct
  • Code-llama-13b-instruct
  • Code-llama-70b-instruct
  • Llama-2-7b
  • Llama-2-7b-chat
  • Llama-2-13b
  • Llama-2-13b-chat
  • Llama-2-70b
  • Llama-2-70b-chat
  • Mixtral-8x7B-Instruct-v0.1
  • Phi-2
  • Zephyr-7b-beta

Note: We are continuously adding support for inference for pre-trained models. If there are other models you’d like to see hosted, please let us know.

Model Size (Pre-trained and Fine-tuned Models)

Price per 1M tokens (input + output)

Up to 7B

$0.2

Up to 21B

$0.25

Up to 70B

$1.0

Mixtral-8x7B

$1.0

Fine-tuning Costs

Predibase offers state-of-the-art fine-tuning at cost-effective prices. Expected costs vary depending on the dataset and the size of the model being fine-tuned.

Calculate Your Fine-tuning Cost

Model Size (parameters)

Dataset (tokens)

Epochs (number of iterations)

Estimated Cost

$0.00

Officially Supported Models for Fine-tuning

  • Llama-3-8b, Llama-3-8b-instruct
  • Llama-3-70b, Llama-3-70b-instruct
  • Llama-2-7b, Llama-2-7b-chat
  • Llama-2-13b, Llama-2-13b-chat
  • Llama-2-70b, Llama-2-70b-chat
  • Mistral-7b, Mistral-7b-instruct-v0.1 and v0.2
  • Mixtral-8x7B-Instruct-v0.1
  • Codellama-13b-instruct, Codellama-70b-instruct
  • Zephyr-7b-beta
  • Gemma-7b, Gemma-7b-instruct
  • Gemma-2b, Gemma-2b-instruct
  • Any OSS Model from Huggingface (best effort)

Fine-Tuning Pricing (per 1M tokens)

Up to 7B
$0.36
7.1B - 21B
$0.65
21.1B - 34B
$2.10
34.1B - 70B
$3.21
Mixtral-8x7B
$3.21

Dedicated Deployments

Serve individual fine-tuned LLMs or one base model with 100+ fine-tuned adapters

We offer usage-based pricing that's billed by the minute. Enjoy exceptional inference performance, seamless autoscaling, and the ability to serve unlimited fine-tuned adapters on a single deployment to maximize your usage of GPUs.

Hardware

Price Per Hour of Serving - Developer Tier

Price Per Hour of Serving - Enterprise Tier

Model Size

1 A10G

$1.82

Discounts available, contact us

Up to 7B

1 L40S

Contact us

Discounts available, contact us

Up to 13B

1 A100

$2.97

Discounts available, contact us

17.1B - 35B

2 A100

Not available

Discounts available, contact us

35.1B - 70B

For L40S and A100 deployments, contact support@predibase.com to discuss our options.

Predibase Tiers

Developer

  • Predibase AI Cloud (SaaS)
  • Pricing: Pay-as-you-go
  • Fine-tune LLMs
    • Up to 70B parameters
    • Access to A100 GPUs
  • Inference
    • Serverless inference: Token-based pricing for base model and fine-tuned model
    • Dedicated deployments: Limited to one 8B parameter base model with multiple fine-tuned adapters.
  • Performance: 100 requests per second
  • Concurrent training jobs: Limited to 2
  • Customer support: Discord, chat and email
Get Started with $25 Free Credits

Note: Free credits expire after 30 days.

Enterprise

  • Predibase AI Cloud (SaaS)
  • Pricing: Discounted pay-as-you-go
  • Fine-tune LLMs
    • Up to 70B parameters
    • Priority access to A100 GPUs
  • Inference
    • Serverless inference: Token-based pricing for base model and fine-tuned model
    • Dedicated Deployments: Unlimited deployments for all model sizes. Serve multiple fine-tuned variants on a single base model with LoRAX.
  • Performance: 100 requests per second
  • Concurrent training jobs: Can be increased upon request
  • Customer support: Predibase's Customer Success program includes a dedicated support channel, training, and priority support from our ML experts.

Virtual Private Cloud (VPC)

  • VPC on AWS, GCP, or Azure
  • Pricing: Platform fee and compute cost
  • Fine-tune LLMs
    • Up to 70B parameters
    • Use GPUs in own cloud
  • Inference
    • Dedicated Deployments: Unlimited deployments for all model sizes. Serve multiple fine-tuned variants on a single base model with LoRAX.
  • Performance: No rate limiting
  • Concurrent training jobs: Unlimited
  • Customer support: Predibase's Customer Success program includes a dedicated support channel, training, and priority support from our ML experts.

Ready to efficiently fine-tune and serve your own LLM?