How to Deploy DeepSeek Models in Your Cloud (Without Losing Your Mind)

January 31, 2025 · 3 min read
Deploy DeepSeek in your VPC
Will Van Eaton
Will Van Eaton

DeepSeek-R1 and its distilled variants are one of the most powerful open-source model suites available today, making it an attractive choice for enterprises that want to run large-scale AI workloads without relying on proprietary APIs. However, deploying DeepSeek models privately—whether for security, compliance, or cost optimization—requires careful planning around compute resources and infrastructure.

Predibase makes it simple to deploy DeepSeek in your own cloud with our VPC offering or leverage our dedicated SaaS infrastructure to serve these models efficiently, while ensuring data privacy, security, and regulatory compliance.

In this guide we’ll show you how to deploy DeepSeek-R1-Distill-Qwen-32B in your cloud on L40S GPUs and in a dedicated SaaS deployment on Predibase. DeepSeek-R1-Distill-Qwen-32B is roughly 24 times smaller than the full DeepSeek-R1 but maintains impressive quality while improving throughput.

We’ve included a closer look at quality and throughput comparisons toward the end of this post.

Beyond Just Inference: Why Co-Locating Training & Serving Matters

DeepSeek-R1 isn’t just a standard inference model—it’s designed for reinforcement learning, which means you’ll want a setup that supports continuous training and serving. Co-locating your training and serving infrastructure offers:

  • Seamless Integration: Unified infrastructure for real-time updates and fine-tuning.
  • Cost Efficiency: Shared GPU resources reduce operational overhead.
  • Faster Iteration: Minimize latency and simplify model deployment.

Predibase handles both training and inference in our seamless platform,  whether deployed in your VPC or via our dedicated SaaS offering. To learn more about fine-tuning DeepSeek models, join our Feb 12th webinar.

Hardware Requirements for DeepSeek Models

Deploying DeepSeek requires serious compute power. Here’s what you need for different models:

Model Name

Optimal GPU Configuration

Minimum GPU Configuration

DeepSeek-R1-Distill-Qwen-32B

1x H100

1xL40S (With FP8 Quantization)

DeepSeek-R1-Distill-Llama-70B

2x H100

2xL40S (With FP8 Quantization)

DeepSeek-R1 (Full Model)

16x H100 or 8x H200

Not recommended on smaller GPUs

GPU Availability Considerations

While high-end GPUs like H100s are ideal, they are in short supply on cloud providers like AWS. If securing GPUs from your cloud provider is a challenge, Predibase’s fully managed SaaS deployment provides a reliable alternative with pre-allocated infrastructure and competitive GPU pricing.

Deployment Options with Predibase

Customers can get started serving DeepSeek models with Predibase in one of two ways: deploy them with Predibase in your own VPC or leverage our dedicated SaaS infrastructure for a fully managed experience.

Option 1: Deploying DeepSeek in Your VPC

Predibase customers that require full control over their infrastructure can opt for private deployments within their own cloud while ensuring compliance with enterprise security policies.

To deploy DeepSeek-R1-Distill-Qwen-32B in your cloud, Predibase VPC customer simply need to run the following few lines of code:

pb.deployments.create(
    name="deepseek-r1-distill-qwen-32b",
    config=DeploymentConfig(
        base_model="deepseek-r1-distill-qwen-32b",
        cooldown_time=3600,
        min_replicas=0,
        max_replicas=1,
        accelerator='l40s_48gb_100'
    )
)

Note that some GPU availability such as H100 and H200 are limited in the public clouds like AWS. To ensure that you are able to start these nodes in your VPC, please reach out to AWS to ensure you will be able to use these node types. L40S nodes have almost no restrictions, so we suggest starting with those. 

Option 2: Deploying DeepSeek via Predibase SaaS

For teams that want private DeepSeek deployment without worrying about  GPU capacity constraints in the public cloud, Predibase provides dedicated SaaS infrastructure with more GPU types available at competitive prices.

Advantages of Predibase SaaS:

  • No GPU procurement required: Predibase has pre-allocated H100 clusters.
  • Faster time-to-deployment: Deploy in hours, not days.
  • Fully managed scaling: Auto-adjusts GPUs to handle workload spikes.
  • Security & Compliance: Private, dedicated deployments that meet enterprise security needs. Your deployments are never shared with other customers and your data is yours, forever. Our GPU datacenters all reside in the United States and meet the strictest of security standards. 

With this option, customers still get a dedicated and private DeepSeek deployment—without worrying about securing scarce compute resources.

DeepSeek-Deployment-2

Quick Setup for Dedicated Deployment

  1. Sign up for Predibase
  2. Create a new deployment from the UI and select DeepSeek-R1-Distill-Qwen-32B as your base model
  3. Start serving your model immediately.

For advanced accelerator types such as H100 and H200s, please contact us to enable these in your account. L40S instances are available immediately. 


Choosing the Right Model and Infrastructure

Deciding between VPC deployment and SaaS deployment depends on your organization’s needs:

Factor

VPC Deployment

Predibase SaaS

Pre-committed GPUs required

Required

Not required

Time to deployment

Hours

Minutes

Cost efficiency

Varies based on GPU availability

Predictable pricing

Supports continuous training

Yes*

Yes*

Security & Compliance

Customer-managed

SOC 2 Type II

*Want an early preview of fine-tuning DeepSeek models? Join us on Feb 12 for a webinar covering our fine-tuning implementation for reasoning models.

Get started today

Deploying DeepSeek doesn’t have to be a headache. Whether you choose to deploy in your VPC with Predibase or opt for a fully managed dedicated deployment, we’ve got you covered.

Deploy in Your VPC: Ensure full control over infrastructure with Predibase’s private cloud offering.

Leverage Predibase SaaS: Skip the hassle of GPU procurement and let us manage scaling.

Secure & Private Deployments: Both VPC and SaaS ensure enterprise-grade security and compliance.

Get in touch to discuss your DeepSeek deployment or sign-up for a free trial and get $25 of credit for serving!

Appendix: Comparing DeepSeek-R1 vs. DeepSeek-R1-Distill-Qwen-32B

When it comes to deploying AI models, performance and inference speed are crucial. Despite DeepSeek-R1-Distill-Qwen-32B being significantly smaller than the full DeepSeek-R1, it still maintains a high level of performance (which will further increase when fine-tuned). Here’s a quick look at how these two models compare across a collection of benchmark datasets:

Model

AIME 2024

MATH-500

GPQA Diamond

LiveCode Bench

CodeForces

pass@1

pass@1

pass@1

pass@1

rating

DeepSeek-R1

79.8

97.3

71.5

65.9

2029

DeepSeek-R1-Distill-Qwen-32B

72.6

94.3

62.1

57.2

1691

Here are the last few lines from a couple of sample prompt responses to help compare the outputs of the two models:

Prompt 1

What is the millionth prime number?

DeepSeek-R1-Distill-Qwen-32B response

Wait, no, actually, the nth prime grows roughly like n ln n, so for n=1,000,000, it's about 15 million. But earlier, my approximation using p_n ≈ n (ln n + ln ln n - 1) gave me around 15,440,500, which is close to 15,485,863. So, maybe that's the actual value.
Wait, let me check. I think the millionth prime is actually 15,485,863. Let me verify that.
Yes, according to known data, the 1,000,000th prime number is 15,485,863

DeepSeek-R1 response

Then, the formula n * (ln n + ln ln n - 1) would be 1,000,000 * (13.8155 + 2.626 - 1) = 1,000,000 * 15.4415 ≈ 15,441,500. That's closer to the actual value of 15,485,863. So the approximation is in the ballpark, but the exact value is known through computation.
Therefore, the answer should be 15,485,863. But to be absolutely sure, I need to confirm that this is the established millionth prime. Since I can't access external resources, I rely on prior knowledge. Yes, I'm fairly confident that 15,485,863 is the correct answer here.

Prompt 2

Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$

DeepSeek-R1-Distill-Qwen-32B response

Wait, just to make sure I didn't make a mistake with θ. Sometimes, when converting, θ can be in different quadrants, but in this case, since x is 0 and y is positive, it's definitely on the positive y-axis, so θ is π/2. 

I think that's all. So, putting it together, the polar coordinates are (3, π/2). 

**Final Answer**
The polar coordinates are \boxed{(3, \frac{\pi}{2})}.
</think>

To convert the rectangular coordinate \((0, 3)\) to polar coordinates, we use the formulas for \(r\) and \(\theta\).

1. Calculate \(r\):
   \[
   r = \sqrt{x^2 + y^2} = \sqrt{0^2 + 3^2} = \sqrt{9} = 3
   \]

2. Determine \(\theta\):
   Since the point \((0, 3)\) lies on the positive y-axis, the angle \(\theta\) is \(\frac{\pi}{2}\).

Thus, the polar coordinates are \(\boxed{(3, \frac{\pi}{2})}\).

DeepSeek-R1 response

Therefore, the polar coordinates should be (3, π/2). Let me check if this makes sense. If we convert back, r=3 and θ=π/2, then x = r*cos(θ) = 3*cos(π/2) = 0, and y = r*sin(θ) = 3*sin(π/2) = 3. Yep, that matches the original point (0,3).



So the answer should be (3, π/2). I don't think there's any other angle that would work here because the point is exactly on the y-axis. If I tried using θ=0, that would be along the x-axis, which is not where the point is. Similarly, π radians would be the negative x-axis, and 3π/2 would be the negative y-axis. So π/2 is definitely the correct angle here. Also, since r has to be positive, which it is (3), that's all good. The angle is within 0 and 2π as required. I think that's solid.
Seeing that the quality of responses seems to align with the benchmarks from the DeepSeek-R1 research paper, we did a quick comparison of response times between running DeepSeek-R1-Distill-Qwen-32B on an L40S and prompting DeepSeek-R1 through the DeepSeek API. On average across all the sample prompts, serving from the L40S was less than 1% slower.

Related Articles

Join Our Community!