Predibase Cloud Pricing
Pricing listed below is for the self-serve tier of Predibase AI Cloud. This pricing is in early access for select customers and will be going GA later this year.
If you’re interested in this model or learning about our Enterprise / VPC pricing, please contact us at
Predibase Inference Endpoints
Predibase supports state-of-the-art, efficient inference for both pre-trained and fine-tuned models at the same, flat per-token price. It is enabled by LoRA Exchange (LoRAX), Predibase’s unique technology that allows us to have the most cost-effective fine-tuned model serving in the market.
For comparison, OpenAI GPT-3.5 charges 8x more for inference on their fine-tuned models than the base model. And most other OSS LLM infrastructure companies don’t give you the option, forcing you to use an expensive $ / GPU-hour pricing model for fine-tuned models.
Base models supported includes:
|Fine-tuned Model Size||Price per 1k tokens (input + output)||Together||HuggingFace|
|Up to 7B||$0.0002||$ / gpu-hour||$ / gpu-hour|
|Up to 13B||$0.00025||$ / gpu-hour||$ / gpu-hour|
|Up to 70B||$0.001||$ / gpu-hour||$ / gpu-hour|
Note: We are continuously adding support for inference for pre-trained models. If there are other models you’d like to see hosted, please reach out to email@example.com.
If your use-case or organization requires it, you also have the option to spin up a private, dedicated deployment for your fine-tuned model where you will be billed on a $/gpu-hour basis. Dedicated deployments, like Predibase's shared deployments, are built on top of LoRA Exchange (LoRAX), allowing you to serve multiple fine-tuned models in your dedicated deployment at no additional cost.
Predibase Training Costs
Predibase offers state-of-the-art fine-tuning and charges for training based on the underlying costs. Expected costs vary depending on the dataset, model being fine-tuned, the compute resources allocated, and the overall time for fine-tuning. You will be billed on a $ / GPU-hour basis for training jobs.