The Container Orchestration Complexity Trap

Travis Knoche
Mar 10
5 min read

Finding the balance between simplicity and scale

I have been in many meetings where someone pitches Kubernetes as the answer to problems that don't exist yet. For instance, the Director of Analytics at a financial services company wanted to modernize their data science platform, as he had recently become enamored with Kubernetes. The current setup is a Docker Compose that runs ShinyProxy and JupyterHub for about 50 developers and 200 consumers of content.

When asked what was wrong with the current stack, the response was "nothing really, but we need to continually move towards the next, more powerful tool".

Hidden Costs of Kubernetes

Kubernetes is not known for its ease of setup.The roughest Kubernetes experience I've had was managing the data science stack on a cluster for a consumer goods company managed by their network team. The production cluster was running fine, scaling and rolling out patches, until one morning it suddenly wasn't. Pods were failing, services were unreachable, and the whole platform was down. I spent days debugging, checking every config, every log, and looking for anything that could have led to the sudden change. After execing into pods and running every curl and openssl command I could think of, it came to light that their internal network team had made some firewall changes over the weekend and neglected to notify the data science team.

The real cost of Kubernetes is not just the complexity or monetary cost of operating the cluster itself, but the many teams and organizational communication necessary to understand where to look when things inevitably go sideways. A more complex platform requires a more mature and robust organizational structure to make the explicit costs worth it. That's the kind of thing we end up debugging for clients — not container issues, but organizational ones.

While the costs of Docker Compose scale linearly with compute, the cost of Kubernetes balloons fast. A Gcore TCO analysis puts self-hosted Kubernetes at roughly $335K/year versus $113K/year for managed, and that gap is almost entirely personnel. We manage clusters across Azure, AWS, and Digital Ocean for clients in this situation, and the bulk of the work isn't Kubernetes — it's coordinating with their networking, security, and IAM teams.

With Kubernetes, you're suddenly dependent on networking, security, storage, and IAM teams who may have different priorities, different release cycles, and different definitions of "production ready". Each dependency is another potential 3am phone call where nobody knows whose problem it actually is.

Kubernetes makes sense when you're operating at genuine scale. If you have 100+ data scientists across multiple business units, need to isolate workloads between teams, and have outgrown what a single powerful server can handle, that's Kubernetes territory. The operational overhead pays for itself when you're coordinating that many users and workloads. At that point, the complexity isn't overhead; it's load-bearing.

Middle Ground

Between Docker Compose and full Kubernetes, there's another option. k3d/k3s (k3d is just an additional docker wrapper around k3s) gives you the parts of Kubernetes that matter for small teams: declarative config, rolling updates, health checks, service discovery. It strips out the distributed systems complexity you're not using. It's kubectl without the 3 AM call.

k3d/k3s gives you Kubernetes APIs without the operational overhead and runs inside Docker on a single host. Resource-wise, k3s needs about 4GB RAM for clusters up to 10 nodes, much lighter than full Kubernetes. k3d/k3s is generally not considered a “production” infrastructure, but I think that can often be overly cautious for single-node analytics platforms for small teams. The tradeoff being that if that one host goes down, everything goes down. This makes sense for organizations who are developing a need for Kubernetes-like capabilities and are exploring Kubernetes, but don't have the resources to support a full Kubernetes cluster.

We've run both setups for clients — one geospatial team has been on Docker Compose for years without incident, and another smoothly migrated from k3d to Kubernetes when it was truly warranted.

A Few Options

So how do you choose? It comes down to team size, tolerance for operational complexity, and whether you actually need horizontal scaling.

Docker Compose:

It's one server with all your containers, and you're done. You can have a Dev and Prod server, but that's all it truly needs to be. We've run this for clients with 30-80 users without much friction. When something breaks, you know where to look. A client in financial services has been on this architecture for three years. It's nothing elaborate, but it just works and can be maintained with common container knowledge.

k3d:

As aforementioned, k3d gives you Kubernetes APIs without the operational overhead and runs inside Docker on a single host.

Full Kubernetes:

This is the real deal with multiple nodes, high availability, and horizontal scaling. We have clients running this for larger teams with 100+ data scientists across business units, and they also have dedicated platform engineers. If you have the team and the scale, it's the right choice. Many organizations aren't there.

Questions That Matter More Than "What's Best"

What breaks if you stay on your current setup another year? If the answer is "nothing, really, we should just move on to the next thing," then you probably don't need to change. I know that's unsatisfying, but it's usually right.

Do you have someone who can maintain this? Most data engineers can handle Docker Compose. For k3d, you need someone comfortable with Kubernetes concepts but not necessarily a specialist. For full Kubernetes, you're hiring or training platform folks, and you're depending on other teams (networking, security, storage) who may or may not make your life easy.

What are your actual uptime requirements? Docker Compose doesn't do rolling deployments, so when you deploy, there's a blip. For most analytics teams doing batch jobs and dashboards, a few minutes of downtime during off-hours is tenable. Real-time systems change the math.

The Path We Usually Recommend

Start simple. Docker Compose handles more than people think.

When you hit limits (and you'll know when you do), k3d is usually the next step. One client made this transition over a weekend. Their data scientists didn't notice.

Full Kubernetes makes sense when you're genuinely at scale with multiple teams, compliance requirements, and dedicated platform engineers who want to be platform engineers.

Final Thoughts

I like Docker Compose because it stays out of the way. I like k3d because it gives you room to grow without the overhead. I like Kubernetes for organizations that actually need it and have the resources and organizational maturity to run it.

The mistake is jumping to the enterprise solution when you're still mid-sized. Match the tool to the problem you have, not the problem you might have in three years. If you have any questions about your current or potential future data science infrastructure stack, reach out to info@landeranalytics.com.

Travis Knoche Senior Data Scientist

Lander Analytics

Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.

Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out at info@landeranalytics.com.

About the author: Travis Knoche is a Senior Data Scientist at Lander Analytics, where he designs, deploys, and maintains data science infrastructure in a wide variety of environments and constraints, and bridges the gap between backend infrastructure and frontend data science development work.