Skip to main content

Building an Internal Developer Platform: From Ticket Hell to Self-Service in 30 Minutes

· 6 min read
Platform Engineering Team

Every platform team reaches the same inflection point. The ticket queue grows faster than the team. Engineers wait days for a namespace, a database, or an ArgoCD application — things that should take minutes. The platform team becomes the bottleneck for every team trying to ship.

We hit that wall. This post is about how we got out of it.

The problem

Before the IDP, onboarding a new service looked like this:

  1. Open a JIRA ticket for namespace creation
  2. Wait 2–5 days for the platform team to review it
  3. Receive the namespace — with the wrong labels, or missing resource limits
  4. Open another ticket for the ArgoCD application
  5. Open another ticket for the database

Eight to fifteen manual steps across multiple tools. Every step required a human from the platform team. Every error required another round-trip.

The root cause was not a lack of automation. We had Terraform, we had Helm, we had ArgoCD. The problem was that none of it was self-service. The automation only ran when the platform team ran it.

What we built

The IDP is four tools working together:

PillarToolRole
Developer portalBackstageTemplates, catalog, observability
GitOps deliveryArgoCDContinuous delivery from Git to Kubernetes
Container runtimeKubernetesWorkload scheduling and isolation
Cloud IaCCrossplaneCloud resources as Kubernetes objects

The key insight was that conventions are the product. A naming convention is not documentation — it is the contract that lets every tool talk to every other tool without a human in the middle.

The semantic key

Every resource in the platform — namespace, ArgoCD application, Backstage entity, Crossplane claim — is addressed by the same three-segment key:

{project}-{env}-{service}

payments-prod-api
│ │ └── Kubernetes namespace: payments-prod-api
│ │ Backstage Component: gateway-api
│ └── ArgoCD Application: gateway-api-prod
└── ArgoCD AppProject: payments
Backstage Domain: payments

When the naming is consistent, the Backstage Kubernetes plugin can find all pods for a service across every cluster with a single label selector. The ArgoCD plugin can surface sync status per environment. No manual mapping. No spreadsheets.

Backstage templates as golden paths

The self-service layer is a set of Backstage Scaffolder templates — one per platform operation:

  • create-domain — creates the ownership boundary, the GitOps repo, and the ArgoCD AppProject
  • create-system — creates a product grouping, its ApplicationSet, and TechDocs base
  • create-service — scaffolds an Application Repository (with Dockerfile, CI workflow, and TechDocs base), generates Kubernetes manifests per environment, registers the catalog entry, and opens the PRs
  • create-resource — creates Crossplane Claims for cloud infrastructure (databases, queues, buckets)
  • create-secret — provides self-service secure secrets encryption using Sealed Secrets
  • create-group / create-user — onboards teams and engineers with correct RBAC from day one

A product engineer runs create-service, selects their system, names the service, picks a resource profile, and clicks Create. Two PRs are opened automatically. When both are merged, ArgoCD detects the new ApplicationSet element and syncs the service to the dev cluster. The service is running in under 30 minutes with zero platform team involvement.

Convention validation in CI

Templates guarantee correct output at creation time. CI validation keeps repos correct over time.

Every domain GitOps repository includes a GitHub Actions workflow (validate-conventions.yaml) generated by create-domain. On every PR it runs validate-namespaces.sh and checks:

  • Namespace naming matches {project}-{env}-{service}
  • All 9 required labels are present
  • Every container has resource requests and limits
  • Kubernetes manifests pass schema validation (kubeconform)
  • ArgoCD dry-run diff against the dev cluster passes

Convention violations block the PR merge. There is no exception path.

Cloud infrastructure as code — without Terraform expertise

Cloud resources (databases, queues, storage) are declared as Crossplane Claims committed to Git. ArgoCD syncs them to the cluster. Crossplane provisions the actual resource on GCP, AWS, Azure, or IBM.

A developer requesting a Cloud SQL instance does not write Terraform. They run create-resource, select GCP → Cloud SQL, fill in a form, and merge the PR. The Claim lands in crossplane/claims/prod/cloudsql-main.yaml. Crossplane reconciles continuously — drift is corrected automatically. deletionPolicy: Orphan on production Claims means an accidental kubectl delete cannot destroy a database.

The Backstage Resource page for that database shows READY: True and SYNCED: True once provisioning completes. No digging through cloud console.

RBAC without the maintenance burden

Access is granted via identity provider groups, never individual users. Removing someone from the GitHub or Okta group immediately revokes all Kubernetes access — no manual RoleBinding cleanup.

The developer role has no RoleBinding created for production namespaces. Not by convention. Not by documentation. By the fact that the binding does not exist. Production access requires explicit role elevation.

Secure secrets management

Access to production environments is strictly controlled, and the platform enforces a strict "no plain-text secrets" rule. Instead of opening tickets to manage secrets, developers use the create-secret Backstage template. They encrypt their secrets locally using kubeseal and fill out the template. The platform automatically opens a PR with the SealedSecret manifest directly into their domain's GitOps repository. The platform service sealed-secrets runs in the cluster and handles the decryption, ensuring secure secrets management remains entirely self-service.

Results so far

We are in Phase 1 of the rollout. The first domain team is fully self-sufficient:

  • Service creation: < 30 minutes (was 2–5 days)
  • Platform team tickets: trending down
  • Namespaces with all required labels: 100% (for new services)
  • Engineer onboarding time: < 2 hours (was 1–3 days)

Phase 2 will extend self-service to all product teams and bring Backstage full-stack observability to every service. Phase 3 migrates existing services and cloud resources into the catalog.

What we learned

Conventions first. No tool integration works without consistent naming. We spent time on the convention before building templates, and every hour there saved ten hours of debugging tool integrations.

Templates as the enforcement mechanism. Documentation gets ignored. Templates make the correct path the only path. If you can run create-service and get a working service, you have no reason to do it manually.

GitOps scales. ApplicationSet matrix generators mean adding a new service to a new cluster is one line in a YAML file. ArgoCD handles the rest. We are not creating Applications by hand.

Crossplane for cloud IaC belongs in Kubernetes. Keeping cloud resources in the same reconciliation loop as application workloads means one place to look for drift, one place to look for status, one RBAC model for access.


The full requirements, architecture, and roadmap are documented in the Platform PRD. The naming convention and operational standards are in the Platform Convention.