100's of Open-source LLMs Ready to Deploy

The largest selection of models for fine-tuning and serving

Predibase offers the largest selection of open-source models for fine-tuning and serving and is also the exclusive fine-tuning partner for Upstage's Solar LLM. Below is a subset of the most popular models we support—including Llama 4, DeepSeek and Qwen 3—and guidance on how to deploy and serve your own models. For the full list of our officially supported models visit the Predibase Docs.

Name	Description	Context Window
DeepSeek R1 Distill Qwen 32B Serverless	DeepSeek-R1-Distill-Qwen-32B is a distilled reasoning model based on Qwen2.5-32B, optimized for fast, cost-effective inference while maintaining strong logical and problem-solving capabilities for AI workloads.	8,192
DeepSeek R1 671B Serverless	DeepSeek-R1 is an open-source reasoning model p, designed to perform complex tasks such as logical inference, mathematical problem-solving, and real-time decision-making, achieving performance comparable to OpenAI's o1 model.	128,000
DeepSeek V3 671B Serverless	DeepSeek V3 is an open-source Mixture-of-Experts language model with 671 billion parameters, activating 37 billion per token, designed for tasks such as reasoning and coding. It employs Multi-Head Latent Attention and multi-token prediction training, achieving performance comparable to leading closed-source models.	128,000
Gemma 2 9B Serverless	Google's latest iteration of open LLMs, based on Google Deepmind Gemini	8,191
Gemma 2 9B Instruct Serverless	Google's latest iteration of open LLMs, based on Google Deepmind Gemini	8,191
Gemma 2 27B Serverless	Google's latest iteration of open LLMs, based on Google Deepmind Gemini	8,191
Gemma 2 27B Instruct Serverless	Google's latest iteration of open LLMs, based on Google Deepmind Gemini	8,191
Gemma 2B Serverless	This is the 2B base version of the Gemma model.	8,192
Gemma 2B Instruct Serverless	This is the 2B instruction-tuned version of the Gemma model.	8,192
Gemma 7B Serverless	This is the 7B base version of the Gemma model	8,192
Gemma 7B Instruct Serverless	This is the 7B instruction-tuned version of the Gemma model.	8,192
Llama 4 Scout 17B	Llama 4 is a collection of multilingual generative language models, available in three sizes. The Llama 4 herd combines text and vision inputs within a single architecture, offering a unified approach that pushes the boundaries of what's possible with multimodal AI. The models provide an extensive context length and were built on a novel mixture-of-experts (MoE) architecture.	128,000
Llama 4 Maverick 17B Serverless	Llama 4 is a collection of multilingual generative language models, available in three sizes. The Llama 4 herd combines text and vision inputs within a single architecture, offering a unified approach that pushes the boundaries of what's possible with multimodal AI. The models provide an extensive context length and were built on a novel mixture-of-experts (MoE) architecture.	128,000
Llama 3.3 70B Serverless	The Meta Llama 3.3 multilingual large language model comes in 70B (text in/text out) parameters and outperforms many of the available open source and closed chat models on common industry benchmarks.	8,192
Llama 3.2 1B	Llama 3.2 is a collection of multilingual generative language models, available in 1B and 3B sizes. The instruction-tuned models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization.
Llama 3.2 1B Instruct Serverless	Llama 3.2 is a collection of multilingual generative language models, available in 1B and 3B sizes. The instruction-tuned models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization.	32,768
Llama 3.2 3B Serverless	Llama 3.2 is a collection of multilingual generative language models, available in 1B and 3B sizes.	32,768
Llama 3.2 3B Instruct Serverless	Llama 3.2 is a collection of multilingual generative language models, available in 1B and 3B sizes. The instruction-tuned models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization.	32,768
Llama 3.2 11B Vision Serverless	Llama 3.2-Vision is a family of multimodal LLMs (11B and 90B) optimized for image reasoning, captioning, and visual Q&A. These models outperform many open and closed-source alternatives on standard benchmarks.	128,000
Llama 3.2 11B Vision Instruct Serverless	Llama 3.2-Vision is a family of multimodal LLMs (11B and 90B) optimized for image reasoning, captioning, and visual Q&A. These models outperform many open and closed-source alternatives on standard benchmarks.	128,000
Llama 3.1 8B Serverless	Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.	63,999
Llama 3.1 8B Instruct Serverless	Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.	63,999
Llama 3 8B Serverless	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. This is the base 8B parameter model.	8,192
Llama 3 8B Instruct Serverless	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. This is the instruction-tuned 8B parameter model.	8,192
Llama 3 70B Serverless	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. This is the base 70B parameter model.	8,192
Llama 3 70B Instruct Serverless	Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. This is the instruction tuned 70B parameter model.	8,192
Llama 2 7B Serverless	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the base 7B model.	4,096
Llama 2 7B Chat Serverless	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the instruction tuned 7B model.	4,096
Llama 2 13B Serverless	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the base 13B model.	4,096
Llama 2 13B Chat Serverless	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the instruction-tuned 13B model.	4,096
Llama 2 70B Serverless	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the base 70B model.	4,096
Llama 2 70B Chat Serverless	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the instruction-tuned 70B model.	4,096
CodeLlama 7B Serverless	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the 7B base version.	16,000
CodeLlama 7B Instruct Serverless	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the 7B instruct-tuned version.	16,000
CodeLlama 13B Instruct Serverless	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the instruction-tuned 13B model.	4,096
CodeLlama 70B Instruct Serverless	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the instruction-tuned 70B model.	4,096
Mistral 7B Serverless	The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters.	32,768
Mistral 7B Instruct Serverless	The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.	32,768
Mistral 7B Instruct v02 Serverless	The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.	32,768
Mistral 7B Instruct v03 Serverless	The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.	32,768
Mixtral 8x7B v01 Serverless	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.	32,768
Mixtral 8x7B Instruct v01 Serverless	The Mixtral-8x7B-instruct-v0.1 Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.	32,768
Phi 3.5 Mini Instruct Serverless	Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data.	128,000
Phi 3 4k Instruct Serverless	The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets. This is the 4k context window instruct version of the model.	4,000
Phi 2 Serverless	Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value).	2,048
Qwen3 0.6B Serverless	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	32,768
Qwen3 1.7B Serverless	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	32,768
Qwen 3 4B	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	128,000
Qwen3 8B Serverless	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	128,000
Qwen3 14B Serverless	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	128,000
Qwen3 30B MoE Serverless	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	128,000
Qwen3 32B Serverless	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	128,000
Qwen3 235B MoE Serverless	Alibaba's Qwen 3 is a family of open-source large language models, spanning from lightweight 0.6B models to a massive 235B MoE model. Designed for versatility and performance, Qwen 3 supports hybrid thinking modes, 119 languages, and excels in coding, math, and general reasoning.	128,000
Qwen2 1.5B Instruct Serverless	Qwen2 is a language model series including decoder language models of different model sizes, developed by Qwen team, Alibaba Cloud	65,536
Qwen2 7B Serverless	Qwen2 is a language model series including decoder language models of different model sizes, developed by Qwen team, Alibaba Cloud	131,072
Qwen2 7B Instruct Serverless	Qwen2 is a language model series including decoder language models of different model sizes, developed by Qwen team, Alibaba Cloud	131,072
Qwen2 72B Serverless	Qwen2 is a language model series including decoder language models of different model sizes, developed by Qwen team, Alibaba Cloud	131,072
Qwen2 72B Instruct Serverless	Qwen2 is a language model series including decoder language models of different model sizes, developed by Qwen team, Alibaba Cloud	131,072
Solar 1 Mini Chat Serverless	Upstage's Solar model is a family of large language models (LLMs) known for its compact yet powerful performance compared to similar sized models below 13B parameter size. The Solar Chat model is optimized for multi-turn chat purposes to effectively handle extended conversations, making it a compelling choice for interactive applications.	32,768
Solar Pro Preview Instruct 22B Serverless	Solar Pro Preview is an advanced LLM with 22 billion parameters designed to fit into a single GPU. Solar Pro Preview shows superior performance compared to LLMs with less than 30 billion parameters and delivers performance comparable to models over three times its size, such as Llama 3.1 with 70 billion parameters.	4,096
Zephyr 7B Beta Serverless	Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).	32,768

Bring Your Own Model

Predibase is the only platform that allows you to take nearly any standard base or fine-tuned LLM from HuggingFace and serve it as a private serverless deployment on state-of-the-art managed infra.

Ready to efficiently fine-tune and serve your own LLM?

Try Predibase for Free