DeepSeek Deployment Guide for VPC and SaaS Clouds

DeepSeek-R1 and its distilled variants are one of the most powerful open-source model suites available today, making it an attractive choice for enterprises that want to run large-scale AI workloads without relying on proprietary APIs. However, deploying DeepSeek models privately—whether for security, compliance, or cost optimization—requires careful planning around compute resources and infrastructure.

Predibase makes it simple to deploy DeepSeek in your own cloud with our VPC offering or leverage our dedicated SaaS infrastructure to serve these models efficiently, while ensuring data privacy, security, and regulatory compliance.

In this guide we’ll show you how to deploy DeepSeek-R1-Distill-Qwen-32B in your cloud on L40S GPUs and in a dedicated SaaS deployment on Predibase. DeepSeek-R1-Distill-Qwen-32B is roughly 24 times smaller than the full DeepSeek-R1 but maintains impressive quality while improving throughput.

We’ve included a closer look at quality and throughput comparisons toward the end of this post.

Beyond Just Inference: Why Co-Locating Training & Serving Matters

DeepSeek-R1 isn’t just a standard inference model—it’s designed for reinforcement learning, which means you’ll want a setup that supports continuous training and serving. Co-locating your training and serving infrastructure offers:

Seamless Integration: Unified infrastructure for real-time updates and fine-tuning.
Cost Efficiency: Shared GPU resources reduce operational overhead.
Faster Iteration: Minimize latency and simplify model deployment.

Predibase handles both training and inference in our seamless platform, whether deployed in your VPC or via our dedicated SaaS offering. To learn more about fine-tuning DeepSeek models, join our Feb 12th webinar.

Hardware Requirements for DeepSeek Models

Deploying DeepSeek requires serious compute power. Here’s what you need for different models:

Model Name	Optimal GPU Configuration	Minimum GPU Configuration
DeepSeek-R1-Distill-Qwen-32B	1x H100	1xL40S (With FP8 Quantization)
DeepSeek-R1-Distill-Llama-70B	2x H100	2xL40S (With FP8 Quantization)
DeepSeek-R1 (Full Model)	16x H100 or 8x H200	Not recommended on smaller GPUs

GPU Availability Considerations

While high-end GPUs like H100s are ideal, they are in short supply on cloud providers like AWS. If securing GPUs from your cloud provider is a challenge, Predibase’s fully managed SaaS deployment provides a reliable alternative with pre-allocated infrastructure and competitive GPU pricing.

Deployment Options with Predibase

Customers can get started serving DeepSeek models with Predibase in one of two ways: deploy them with Predibase in your own VPC or leverage our dedicated SaaS infrastructure for a fully managed experience.

Option 1: Deploying DeepSeek in Your VPC

Predibase customers that require full control over their infrastructure can opt for private deployments within their own cloud while ensuring compliance with enterprise security policies.

To deploy DeepSeek-R1-Distill-Qwen-32B in your cloud, Predibase VPC customer simply need to run the following few lines of code:

pb.deployments.create(
    name="deepseek-r1-distill-qwen-32b",
    config=DeploymentConfig(
        base_model="deepseek-r1-distill-qwen-32b",
        cooldown_time=3600,
        min_replicas=0,
        max_replicas=1,
        accelerator='l40s_48gb_100'
    )
)

Note that some GPU availability such as H100 and H200 are limited in the public clouds like AWS. To ensure that you are able to start these nodes in your VPC, please reach out to AWS to ensure you will be able to use these node types. L40S nodes have almost no restrictions, so we suggest starting with those.

Option 2: Deploying DeepSeek via Predibase SaaS

For teams that want private DeepSeek deployment without worrying about GPU capacity constraints in the public cloud, Predibase provides dedicated SaaS infrastructure with more GPU types available at competitive prices.

Advantages of Predibase SaaS:

No GPU procurement required: Predibase has pre-allocated H100 clusters.
Faster time-to-deployment: Deploy in hours, not days.
Fully managed scaling: Auto-adjusts GPUs to handle workload spikes.
Security & Compliance: Private, dedicated deployments that meet enterprise security needs. Your deployments are never shared with other customers and your data is yours, forever. Our GPU datacenters all reside in the United States and meet the strictest of security standards.

With this option, customers still get a dedicated and private DeepSeek deployment—without worrying about securing scarce compute resources.

Quick Setup for Dedicated Deployment

Sign up for Predibase
Create a new deployment from the UI and select DeepSeek-R1-Distill-Qwen-32B as your base model
Start serving your model immediately.

For advanced accelerator types such as H100 and H200s, please contact us to enable these in your account. L40S instances are available immediately.

Choosing the Right Model and Infrastructure

Deciding between VPC deployment and SaaS deployment depends on your organization’s needs:

Factor	VPC Deployment	Predibase SaaS
Pre-committed GPUs required	Required	Not required
Time to deployment	Hours	Minutes
Cost efficiency	Varies based on GPU availability	Predictable pricing
Supports continuous training	Yes*	Yes*
Security & Compliance	Customer-managed	SOC 2 Type II

*Want an early preview of fine-tuning DeepSeek models? Join us on Feb 12 for a webinar covering our fine-tuning implementation for reasoning models.

Get started today

Deploying DeepSeek doesn’t have to be a headache. Whether you choose to deploy in your VPC with Predibase or opt for a fully managed dedicated deployment, we’ve got you covered.

✅ Deploy in Your VPC: Ensure full control over infrastructure with Predibase’s private cloud offering.

✅ Leverage Predibase SaaS: Skip the hassle of GPU procurement and let us manage scaling.

✅ Secure & Private Deployments: Both VPC and SaaS ensure enterprise-grade security and compliance.

Get in touch to discuss your DeepSeek deployment or sign-up for a free trial and get $25 of credit for serving!

Appendix: Comparing DeepSeek-R1 vs. DeepSeek-R1-Distill-Qwen-32B

When it comes to deploying AI models, performance and inference speed are crucial. Despite DeepSeek-R1-Distill-Qwen-32B being significantly smaller than the full DeepSeek-R1, it still maintains a high level of performance (which will further increase when fine-tuned). Here’s a quick look at how these two models compare across a collection of benchmark datasets:

Model	AIME 2024	MATH-500	GPQA Diamond	LiveCode Bench	CodeForces
	pass@1	pass@1	pass@1	pass@1	rating
DeepSeek-R1	79.8	97.3	71.5	65.9	2029
DeepSeek-R1-Distill-Qwen-32B	72.6	94.3	62.1	57.2	1691

Here are the last few lines from a couple of sample prompt responses to help compare the outputs of the two models:

Prompt 1

What is the millionth prime number?

DeepSeek-R1-Distill-Qwen-32B response

Wait, no, actually, the nth prime grows roughly like n ln n, so for n=1,000,000, it's about 15 million. But earlier, my approximation using p_n ≈ n (ln n + ln ln n - 1) gave me around 15,440,500, which is close to 15,485,863. So, maybe that's the actual value.
Wait, let me check. I think the millionth prime is actually 15,485,863. Let me verify that.
Yes, according to known data, the 1,000,000th prime number is 15,485,863

DeepSeek-R1 response

Then, the formula n * (ln n + ln ln n - 1) would be 1,000,000 * (13.8155 + 2.626 - 1) = 1,000,000 * 15.4415 ≈ 15,441,500. That's closer to the actual value of 15,485,863. So the approximation is in the ballpark, but the exact value is known through computation.
Therefore, the answer should be 15,485,863. But to be absolutely sure, I need to confirm that this is the established millionth prime. Since I can't access external resources, I rely on prior knowledge. Yes, I'm fairly confident that 15,485,863 is the correct answer here.

Prompt 2

Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$