LiteLLM Agent Platform

A tiernew this week

Self-hosted Kubernetes infrastructure for running coding agents like Claude Code and Codex in isolated, persistent sandboxes — built for teams who want managed agent infra without giving up data control.

Open LiteLLM Agent Platform →Compare with alternatives

Kai's verdict

A genuinely useful piece of missing infrastructure for teams already running LiteLLM Gateway who want to graduate from 'agent in a script' to 'agent in production' — but you'll need real Kubernetes chops and tolerance for alpha-stage rough edges before this earns a place in a critical workflow. (Verdict pending Phi's full review.)

Strengths

Full data residency — nothing leaves your own Kubernetes cluster, making it viable for regulated industries
Persistent session state across pod restarts via Postgres, solving the #1 pain point of stateful agents in production
Vault proxy for secrets: sandbox containers get stub credentials; real keys are injected on outbound TLS, not baked into pods
Plugs directly into LiteLLM Gateway for model routing, cost tracking, and rate limiting across 100+ providers
Quickstart is genuinely fast — two shell commands stand up a local kind cluster with working sandboxes

Weaknesses

Alpha-quality: rough edges, community-only support, no SLA — not for teams that need production guarantees today
Meaningful DevOps overhead: requires Docker, Kubernetes (kind or EKS), Postgres, and Helm familiarity just to get started
Sessions expire after 24h; no longer-lived persistence or cross-session memory story yet

Best for

Platform and DevOps engineers at orgs that need to run multi-agent coding workflows (Claude Code, Codex) in production, with full infrastructure ownership and strict data residency requirements.

Pricing

Free (open source, MIT)

Free to self-host; you pay your own cloud infrastructure costs (AWS EKS, Render, etc.). No SaaS tier — fully bring-your-own-infra.

Alternatives worth knowing

OpenRouter

One API, every model. Pay-as-you-go, no subscriptions.

Ollama

Run LLMs locally. One-line install, GUI optional.

Replicate

Run any open-source AI model with an API call.

Groq

The fastest AI inference in the world. Crazy low latency.

Devin

Cognition Labs' autonomous coding engineer.

Claude Agent SDK

Anthropic's SDK for building your own agents on Claude.