DeepSeek-R1 and its distilled variants are one of the most powerful open-source model suites available today, making it an attractive choice for enterprises that want to run large-scale AI workloads without relying on proprietary APIs. However, deploying DeepSeek models privately—whether for security, compliance, or cost optimization—requires careful planning around compute resources and infrastructure.
Predibase makes it simple to deploy DeepSeek in your own cloud with our VPC offering or leverage our dedicated SaaS infrastructure to serve these models efficiently, while ensuring data privacy, security, and regulatory compliance.
In this guide we’ll show you how to deploy DeepSeek-R1-Distill-Qwen-32B in your cloud on L40S GPUs and in a dedicated SaaS deployment on Predibase. DeepSeek-R1-Distill-Qwen-32B is roughly 24 times smaller than the full DeepSeek-R1 but maintains impressive quality while improving throughput.
We’ve included a closer look at quality and throughput comparisons toward the end of this post.
Beyond Just Inference: Why Co-Locating Training & Serving Matters
DeepSeek-R1 isn’t just a standard inference model—it’s designed for reinforcement learning, which means you’ll want a setup that supports continuous training and serving. Co-locating your training and serving infrastructure offers:
- Seamless Integration: Unified infrastructure for real-time updates and fine-tuning.
- Cost Efficiency: Shared GPU resources reduce operational overhead.
- Faster Iteration: Minimize latency and simplify model deployment.
Predibase handles both training and inference in our seamless platform, whether deployed in your VPC or via our dedicated SaaS offering. To learn more about fine-tuning DeepSeek models, join our Feb 12th webinar.
Hardware Requirements for DeepSeek Models
Deploying DeepSeek requires serious compute power. Here’s what you need for different models:
Model Name | Optimal GPU Configuration | Minimum GPU Configuration |
DeepSeek-R1-Distill-Qwen-32B | 1x H100 | 1xL40S (With FP8 Quantization) |
DeepSeek-R1-Distill-Llama-70B | 2x H100 | 2xL40S (With FP8 Quantization) |
DeepSeek-R1 (Full Model) | 16x H100 or 8x H200 | Not recommended on smaller GPUs |
GPU Availability Considerations
While high-end GPUs like H100s are ideal, they are in short supply on cloud providers like AWS. If securing GPUs from your cloud provider is a challenge, Predibase’s fully managed SaaS deployment provides a reliable alternative with pre-allocated infrastructure and competitive GPU pricing.
Deployment Options with Predibase
Customers can get started serving DeepSeek models with Predibase in one of two ways: deploy them with Predibase in your own VPC or leverage our dedicated SaaS infrastructure for a fully managed experience.
Option 1: Deploying DeepSeek in Your VPC
Predibase customers that require full control over their infrastructure can opt for private deployments within their own cloud while ensuring compliance with enterprise security policies.
To deploy DeepSeek-R1-Distill-Qwen-32B in your cloud, Predibase VPC customer simply need to run the following few lines of code:
pb.deployments.create(
name="deepseek-r1-distill-qwen-32b",
config=DeploymentConfig(
base_model="deepseek-r1-distill-qwen-32b",
cooldown_time=3600,
min_replicas=0,
max_replicas=1,
accelerator='l40s_48gb_100'
)
)
Note that some GPU availability such as H100 and H200 are limited in the public clouds like AWS. To ensure that you are able to start these nodes in your VPC, please reach out to AWS to ensure you will be able to use these node types. L40S nodes have almost no restrictions, so we suggest starting with those.
Option 2: Deploying DeepSeek via Predibase SaaS
For teams that want private DeepSeek deployment without worrying about GPU capacity constraints in the public cloud, Predibase provides dedicated SaaS infrastructure with more GPU types available at competitive prices.
Advantages of Predibase SaaS:
- No GPU procurement required: Predibase has pre-allocated H100 clusters.
- Faster time-to-deployment: Deploy in hours, not days.
- Fully managed scaling: Auto-adjusts GPUs to handle workload spikes.
- Security & Compliance: Private, dedicated deployments that meet enterprise security needs. Your deployments are never shared with other customers and your data is yours, forever. Our GPU datacenters all reside in the United States and meet the strictest of security standards.
With this option, customers still get a dedicated and private DeepSeek deployment—without worrying about securing scarce compute resources.
Quick Setup for Dedicated Deployment
- Sign up for Predibase
- Create a new deployment from the UI and select DeepSeek-R1-Distill-Qwen-32B as your base model
- Start serving your model immediately.
For advanced accelerator types such as H100 and H200s, please contact us to enable these in your account. L40S instances are available immediately.
Choosing the Right Model and Infrastructure
Deciding between VPC deployment and SaaS deployment depends on your organization’s needs:
Factor | VPC Deployment | Predibase SaaS |
Pre-committed GPUs required | Required | Not required |
Time to deployment | Hours | Minutes |
Cost efficiency | Varies based on GPU availability | Predictable pricing |
Supports continuous training | Yes* | Yes* |
Security & Compliance | Customer-managed | SOC 2 Type II |
*Want an early preview of fine-tuning DeepSeek models? Join us on Feb 12 for a webinar covering our fine-tuning implementation for reasoning models.
Get started today
Deploying DeepSeek doesn’t have to be a headache. Whether you choose to deploy in your VPC with Predibase or opt for a fully managed dedicated deployment, we’ve got you covered.
✅ Deploy in Your VPC: Ensure full control over infrastructure with Predibase’s private cloud offering.
✅ Leverage Predibase SaaS: Skip the hassle of GPU procurement and let us manage scaling.
✅ Secure & Private Deployments: Both VPC and SaaS ensure enterprise-grade security and compliance.
Get in touch to discuss your DeepSeek deployment or sign-up for a free trial and get $25 of credit for serving!
Appendix: Comparing DeepSeek-R1 vs. DeepSeek-R1-Distill-Qwen-32B
When it comes to deploying AI models, performance and inference speed are crucial. Despite DeepSeek-R1-Distill-Qwen-32B being significantly smaller than the full DeepSeek-R1, it still maintains a high level of performance (which will further increase when fine-tuned). Here’s a quick look at how these two models compare across a collection of benchmark datasets:
Model | AIME 2024 | MATH-500 | GPQA Diamond | LiveCode Bench | CodeForces |
pass@1 | pass@1 | pass@1 | pass@1 | rating | |
DeepSeek-R1 | 79.8 | 97.3 | 71.5 | 65.9 | 2029 |
DeepSeek-R1-Distill-Qwen-32B | 72.6 | 94.3 | 62.1 | 57.2 | 1691 |
Here are the last few lines from a couple of sample prompt responses to help compare the outputs of the two models:
Prompt 1
What is the millionth prime number?
DeepSeek-R1-Distill-Qwen-32B response
Wait, no, actually, the nth prime grows roughly like n ln n, so for n=1,000,000, it's about 15 million. But earlier, my approximation using p_n ≈ n (ln n + ln ln n - 1) gave me around 15,440,500, which is close to 15,485,863. So, maybe that's the actual value.
Wait, let me check. I think the millionth prime is actually 15,485,863. Let me verify that.
Yes, according to known data, the 1,000,000th prime number is 15,485,863
DeepSeek-R1 response
Then, the formula n * (ln n + ln ln n - 1) would be 1,000,000 * (13.8155 + 2.626 - 1) = 1,000,000 * 15.4415 ≈ 15,441,500. That's closer to the actual value of 15,485,863. So the approximation is in the ballpark, but the exact value is known through computation.
Therefore, the answer should be 15,485,863. But to be absolutely sure, I need to confirm that this is the established millionth prime. Since I can't access external resources, I rely on prior knowledge. Yes, I'm fairly confident that 15,485,863 is the correct answer here.
Prompt 2
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$
DeepSeek-R1-Distill-Qwen-32B response
Wait, just to make sure I didn't make a mistake with θ. Sometimes, when converting, θ can be in different quadrants, but in this case, since x is 0 and y is positive, it's definitely on the positive y-axis, so θ is π/2.
I think that's all. So, putting it together, the polar coordinates are (3, π/2).
**Final Answer**
The polar coordinates are \boxed{(3, \frac{\pi}{2})}.
</think>
To convert the rectangular coordinate \((0, 3)\) to polar coordinates, we use the formulas for \(r\) and \(\theta\).
1. Calculate \(r\):
\[
r = \sqrt{x^2 + y^2} = \sqrt{0^2 + 3^2} = \sqrt{9} = 3
\]
2. Determine \(\theta\):
Since the point \((0, 3)\) lies on the positive y-axis, the angle \(\theta\) is \(\frac{\pi}{2}\).
Thus, the polar coordinates are \(\boxed{(3, \frac{\pi}{2})}\).
DeepSeek-R1 response
Therefore, the polar coordinates should be (3, π/2). Let me check if this makes sense. If we convert back, r=3 and θ=π/2, then x = r*cos(θ) = 3*cos(π/2) = 0, and y = r*sin(θ) = 3*sin(π/2) = 3. Yep, that matches the original point (0,3).
So the answer should be (3, π/2). I don't think there's any other angle that would work here because the point is exactly on the y-axis. If I tried using θ=0, that would be along the x-axis, which is not where the point is. Similarly, π radians would be the negative x-axis, and 3π/2 would be the negative y-axis. So π/2 is definitely the correct angle here. Also, since r has to be positive, which it is (3), that's all good. The angle is within 0 and 2π as required. I think that's solid.
Seeing that the quality of responses seems to align with the benchmarks from the DeepSeek-R1 research paper, we did a quick comparison of response times between running DeepSeek-R1-Distill-Qwen-32B on an L40S and prompting DeepSeek-R1 through the DeepSeek API. On average across all the sample prompts, serving from the L40S was less than 1% slower.