Big Model Performance
Small Model Cost

Highest quality, fastest throughput small language models in your cloud

FROM THE CREATORS OFLudwig & lorax

Built by AI leaders from Uber, Google, Apple and Amazon. Developed and deployed with the world’s leading organizations.

GPT-4 quality for less than GPT-3.5 price

Cost should never hold you back. Future-proof of your AI spend today by fine-tuning small, task-specific models that rival GPT-4 for a fraction of the cost.

Fine-tune and Serve 100s of LLMs on Our Cloud or Yours

The biggest selection of models at industry-leading pricing

CodeLlama 13B Instruct

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale...

Try it for free

Phi 3 4k instruct

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained...

Try it for free

Llama 3 8B

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection...

Try it for free

Mixtral 8x7B v01

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts...

Try it for free

See full list of supported models

Bigger Isn’t Always Better

Fine-tune smaller task-specific LLMs that outperform bloated alternatives from commercial vendors. Don’t pay for what you don’t need.

One Platform to Customize Small Models for Your Use Case

First-class fine-tuning experience

Predibase offers state-of-the-art fine-tuning techniques like quantization, low-rank adaptation, and memory-efficient distributed training so you can customize small models fast and efficiently with the best possible results. All of this can be done in a few lines of code or through our easy-to-use UI.

The most cost-effective serving infra

Our unique serving infra—powered by Turbo LoRA and LoRAX—enables you to cost-effectively serve many fine-tuned adapters on a single private serverless GPU at speeds 2-3x faster than alternatives. We also provide free shared serverless inference up to 1M tokens per day / 10M tokens per month for prototyping.

Your models in your cloud

Start owning and stop renting your LLMs. Customize open-source models on your data securely in your virtual private cloud with our SOC-2 compliant platform. Enterprise and VPC customers can download and export their trained models at any time, ensuring you always retain control of your IP.

The Fastest, Most Efficient Way to Customize and Deploy Small Models

Try any open-source LLM in an instant

Stop spending hours wrestling with complex model deployments before you’ve even started fine-tuning. Deploy any open-source LLM—like Llama-3, Phi-3 and Mistral—and start prompting instantly to determine the best base model for your use case. Scalable managed infra, available in the Predibase cloud our your VPC, lets you experiment in minutes with only a few lines of code or via our user-friendly UI.

# Deploy an LLM from HuggingFace
pb.deployments.create(
    name="my-llama-3-70b-deployment",
    description="Deployment of Llama 3 70b in Predibase Cloud",
    config=DeploymentConfig(
        base_model="llama-3-70b",
    )
)

# Prompt the deployed LLM
client = pb.deployments.client("my-llama-3-70b-deployment")
client.generate("Write an algorithm in Java to reverse the words in a string.")

Easily customize models for your use case

No more out-of-memory errors or costly training jobs. Fine-tune any open-source LLM on the most readily available GPUs using Predibase’s optimized training system. We automatically apply dozens of optimizations, such as quantization, low-rank adaptation, and memory-efficient distributed training, combined with right-sized compute to ensure your jobs are successfully trained as efficiently as possible.

# Kick off the fine-tune job
adapter = pb.adapters.create(
    config=FinetuningConfig(
        base_model: "llama-3-70b",
        epochs: 3,
        learning_rate: 0.0002,
    ),
    dataset=my_dataset,
    repo="my_adapter",
    description='Fine-tune llama-3-70b with my dataset for my task.',
)

Dynamically serve many fine-tuned LLMs in one deployment

Our scalable serving infra automatically scales up and down to meet the demands of your production environment. Dynamically serve many fine-tuned LLMs together for over 100x cost reduction with our novel LoRA Exchange (LoRAX) architecture. Load and query an adapter in milliseconds.

Built on Proven Open-Source Technology

LoRAX

LoRAX (LoRA eXchange) enables users to serve thousands of fine-tuned LLMs on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.

Ludwig

Ludwig is a declarative framework to develop, train, fine-tune, and deploy state-of-the-art deep learning and large language models. Ludwig puts AI in the hands of all engineers without requiring low-level code.