Problem Statement

Current State — Pain Points

Slow Environment Provisioning

Teams open tickets to the platform team to get namespaces, RBAC, CI/CD pipelines, and cloud resources created. Average wait time is 2–5 days per request. New service bootstrapping requires 8–15 manual steps across multiple tools.

Naming and Configuration Drift

Each team names namespaces, labels, and ArgoCD applications differently. There is no standard label set — Kubernetes, ArgoCD, and Backstage cannot cross-reference workloads without manual mapping. Incidents take longer to diagnose because there is no single source of truth for "which service owns this namespace".

No Developer Self-Service

Developers have no visibility into what they own, what depends on what, or the health of their services across environments. Every cross-env question goes through Slack or JIRA to the platform team.

Infrastructure Sprawl

Cloud resources (databases, clusters, queues) are provisioned ad-hoc via Terraform scripts with no central inventory. Abandoned resources accumulate cost. Deletion of production resources has happened accidentally.

Onboarding Friction

New engineers spend 1–3 days getting access to the right clusters, repositories, and tools. Role and permission boundaries are undocumented and inconsistently applied.

Root Causes

No single convention for naming resources across systems
No self-service pathway — all provisioning requires platform team intervention
No catalog to discover what exists, who owns it, and its current state
Infrastructure state is stored in Terraform state files, not Kubernetes — no continuous reconciliation or drift detection
RBAC is applied per-person instead of per-group, making offboarding error-prone

Current State — Pain Points​

Slow Environment Provisioning​

Naming and Configuration Drift​

No Developer Self-Service​

Infrastructure Sprawl​

Onboarding Friction​

Root Causes​