Predibase Cloud Pricing

Pricing listed below is for the self-serve tier of Predibase AI Cloud. This pricing is in early access for select customers and will be going GA later this year.

If you’re interested in this model or learning about our Enterprise / VPC pricing, please contact us at

support@predibase.com.

Predibase Inference Endpoints

Predibase supports state-of-the-art, efficient inference for both pre-trained and fine-tuned models at the same, flat per-token price. It is enabled by LoRA Exchange (LoRAX), Predibase’s unique technology that allows us to have the most cost-effective fine-tuned model serving in the market.

For comparison, OpenAI GPT-3.5 charges 8x more for inference on their fine-tuned models than the base model. And most other OSS LLM infrastructure companies don’t give you the option, forcing you to use an expensive $ / GPU-hour pricing model for fine-tuned models.

Base models supported includes:

Code-llama-13b-instruct (new!)
Code-llama-34b (coming soon)
Llama-2-7b
Llama-2-7b-chat
Llama-2-13b-chat
Llama-2-70b-chat
Mistral-7b-instruct (new!)
Zephyr-7b-beta (new!)
Fine-tuned Model SizePrice per 1k tokens (input + output)TogetherHuggingFace
Up to 7B$0.0002$ / gpu-hour$ / gpu-hour
Up to 13B$0.00025$ / gpu-hour$ / gpu-hour
Up to 70B$0.001$ / gpu-hour$ / gpu-hour

Note: We are continuously adding support for inference for pre-trained models. If there are other models you’d like to see hosted, please reach out to support@predibase.com.

Dedicated Deployments

If your use-case or organization requires it, you also have the option to spin up a private, dedicated deployment for your fine-tuned model where you will be billed on a $/gpu-hour basis. Dedicated deployments, like Predibase's shared deployments, are built on top of LoRA Exchange (LoRAX), allowing you to serve multiple fine-tuned models in your dedicated deployment at no additional cost.

Predibase Training Costs

Predibase offers state-of-the-art fine-tuning and charges for training based on the underlying costs. Expected costs vary depending on the dataset, model being fine-tuned, the compute resources allocated, and the overall time for fine-tuning. You will be billed on a $ / GPU-hour basis for training jobs. 

Ready to accelerate your model building?