Self-host models or use APIs?

Apr 05, 2025

If I am an enterprise CTO considering using DeepSeek models, I would imagine the questions making rounds would be: what are the considerations with deploying an open-source model like DeepSeek for my AI projects? How do these considerations change, if any, if the inference-time flops are coming down as significantly?

1/ When kickstarting new AI initiatives, the truth is that the technical bar for calling OpenAI/Claude APIs is much lower for lean-prototyping teams compared to getting entangled in the work of self-hosting models on EKS/GKE. I am excited to see that the DeepSeek launch is going to give momentum to startups/builders (like Replicate, Modal, Baseten) who are simplifying the process of deploying open-source models and differentiating themselves with simple developer experiences.

2/If you start comparing costs of self-hosting a model (e.g., request throughput/GPU) with calling an inference API ($/token) in production environments, the reality is that running models requires engineering systems and operational excellence that most customers, even the largest enterprises won’t and can’t invest in building in-house. Given the status quo, it’s not as simple as spinning up a VM. There is a fair bit of muck involved in setting up and measuring inference-server and GPU-level optimizations, and with infra deployments (auto-scaling, load balancing, reducing latency etc). This reflects in the infra margins for these specialized GPU-hosting services. It’s not clear whether incumbent or specialized Inference-as-a-service providers can establish differentiation here or it will become a race to the bottom with infra margins.

3/If we think of models as a net new primitive just like compute, storage, networking, one implication of DeepSeek is that the “Novelty/R&D investment” Margins as priced in $/token by closed-model providers may eventually need to compete with the Infrastructure Margins of a managed or self-hosted open-source model. This may accrue more value to domain-specific models, not just models, AI solutions that enterprises can plug in their data to get value. It will be interesting to see how this development influences buy vs build decision that each large organization (~1000+ employees) faces.

Self-host models or use APIs?

Discussion about this post