Skip to main content

Problem Statement

Current State — Pain Points

Slow Environment Provisioning

Teams open tickets to the platform team to get namespaces, RBAC, CI/CD pipelines, and cloud resources created. Average wait time is 2–5 days per request. New service bootstrapping requires 8–15 manual steps across multiple tools.

Naming and Configuration Drift

Each team names namespaces, labels, and ArgoCD applications differently. There is no standard label set — Kubernetes, ArgoCD, and Backstage cannot cross-reference workloads without manual mapping. Incidents take longer to diagnose because there is no single source of truth for "which service owns this namespace".

No Developer Self-Service

Developers have no visibility into what they own, what depends on what, or the health of their services across environments. Every cross-env question goes through Slack or JIRA to the platform team.

Infrastructure Sprawl

Cloud resources (databases, clusters, queues) are provisioned ad-hoc via Terraform scripts with no central inventory. Abandoned resources accumulate cost. Deletion of production resources has happened accidentally.

Onboarding Friction

New engineers spend 1–3 days getting access to the right clusters, repositories, and tools. Role and permission boundaries are undocumented and inconsistently applied.

Root Causes

  1. No single convention for naming resources across systems
  2. No self-service pathway — all provisioning requires platform team intervention
  3. No catalog to discover what exists, who owns it, and its current state
  4. Infrastructure state is stored in Terraform state files, not Kubernetes — no continuous reconciliation or drift detection
  5. RBAC is applied per-person instead of per-group, making offboarding error-prone