Skip to main content

Requirements

Functional Requirements

FR1 — Naming Convention

  • Every Kubernetes namespace follows the pattern {project}-{env}-{service}
  • Every namespace carries all 9 required labels: project, env, service, team, backstage.io/domain, backstage.io/system, backstage.io/component, argocd/app, argocd/app-set
  • Backstage entity names are env-agnostic: {system}-{service} for Components, {provider}-{project}-{env}-{type}-{name} for Resources
  • ArgoCD Application names follow {system}-{service}-{env}

FR2 — GitOps Repositories

  • One platform-gitops repository owned by the platform team containing AppProjects, ApplicationSets, Crossplane XRDs/Compositions, RBAC, and Backstage templates
  • One {domain}-gitops repository per domain containing k8s manifests, Crossplane Claims, and Backstage catalog entities
  • One {app}-repo application repository per service containing the source code, Dockerfile, TechDocs, and CI pipelines
  • Domain repos are created automatically by the create-domain Backstage template, and application repos by create-service

FR3 — ArgoCD Delivery

  • Every domain has an AppProject that restricts source repos and destination namespaces
  • Services are deployed via ApplicationSet matrix generators (services × clusters)
  • Prod Applications require manual sync — enforced via syncWindows and templatePatch
  • Sync windows deny automated prod deploys on weekday evenings and weekends
  • ArgoCD ignoreDifferences on /spec/replicas prevents overwriting HPA decisions
  • Crossplane Claims are deployed via git-directory-generator ApplicationSets

FR4 — Backstage Catalog

  • Domain, System, Component, Resource, Group, and User entities for every platform asset
  • backstage.io/kubernetes-label-selector on every Component surfaces health across all clusters with a single annotation
  • argocd/app-selector on every Component surfaces sync status per environment
  • backstage.io/kubernetes-label-selector on every Resource surfaces Crossplane Claim READY/SYNCED status
  • Catalog registered via catalog-info.yaml Location entities per domain repo

FR5 — Cloud Resource Provisioning (Crossplane)

  • Cloud resources declared as Crossplane Claims in {domain}-gitops/crossplane/claims/{env}/
  • Infra namespaces {domain}-{env}-infra isolated from application namespaces
  • deletionPolicy: Orphan default on all production Claims
  • Connection secrets written to the infra namespace after provisioning
  • Supported providers: GCP, AWS, Azure, IBM
  • Supported resource types: clusters (GKE/EKS/AKS/IKS), databases (Cloud SQL/RDS/PostgreSQL/Cosmos), queues (Pub/Sub/SQS/Service Bus/Event Streams), storage (GCS/S3/Blob/COS), cache (Memorystore/ElastiCache/Redis), registries, secret stores

FR6 — RBAC and Access Control

  • Kubernetes RoleBindings use Group subjects only (never individual User subjects for routine access)
  • Four platform roles: viewer, developer, lead, platform-engineer
  • developer role: no RoleBinding created for prod namespaces
  • platform-engineer role: ClusterRoleBinding for platform-* namespaces
  • ArgoCD project roles: readonly, developer (dev/staging sync only), domain-admin
  • RBAC applied exclusively via Backstage templates (create-group, create-user)

FR7 — Backstage Scaffolder Templates

TemplateTriggersOutput
create-domainOnboarding new product areaDomain entity, {domain}-gitops repo, AppProject
create-systemAdding a new product groupingSystem entity, ApplicationSet, TechDocs base
create-serviceAdding a new workloadNew {app}-repo (CI/CD, TechDocs), k8s manifests per env, Component entity, AppSet element
create-resourceRequesting cloud infrastructureCrossplane Claims per env, Resource entities
create-secretEncrypting sensitive dataSealedSecret manifests securely pushed to domain-gitops
create-groupOnboarding a new teamGroup entity, RBAC bindings, ArgoCD roles
create-userOnboarding a new engineerUser entity, RBAC bindings, ArgoCD user

All templates use EntityPicker to enforce referential integrity — a child entity cannot be created without its parent already existing in the catalog.

FR8 — Convention Validation CI

  • Every domain repo contains .github/workflows/validate-conventions.yaml generated by create-domain
  • Checks: kubeconform schema validation, namespace naming pattern via validate-namespaces.sh, all 9 required labels, resource requests/limits on all containers, ArgoCD dry-run diff against dev cluster
  • Convention violations fail the PR — merge is blocked

Non-Functional Requirements

NFR1 — Reliability

  • ArgoCD: 99.9% uptime on management cluster
  • Backstage: 99.5% uptime; degraded functionality acceptable during outages
  • Crossplane: continuous reconciliation loop — drift corrected within 5 minutes of detection

NFR2 — Performance

  • Backstage catalog page load: < 2 seconds (p95)
  • ArgoCD sync latency: < 60 seconds from push to namespace update
  • create-service template run time: < 3 minutes end-to-end

NFR3 — Security

  • No secrets in plain text in Git — self-service Sealed Secrets via the create-secret template
  • Network isolation enforced by NetworkPolicy on every namespace (deny all by default)
  • Pod security defaults: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false
  • Prod sync windows: automated syncs denied outside business hours
  • OIDC authentication for Backstage, ArgoCD, and Headlamp

NFR4 — Scalability

  • Platform must support 20+ domains, 100+ systems, 500+ services without architectural changes
  • ApplicationSet matrix generator scales linearly — no manual Application creation needed
  • Crossplane git-directory generator auto-discovers new Claims without ApplicationSet changes

NFR5 — Observability

  • All platform services instrumented with Prometheus metrics
  • Logs shipped to central Loki via Alloy on every cluster
  • Backstage k8s plugin surfaces pod health and recent events per component per env
  • Crossplane Claim status (READY/SYNCED) surfaced on Backstage Resource pages

NFR6 — Developer Experience

  • P1 (Product Engineer) can create a new service with zero platform team interaction
  • All required knowledge is encoded in templates — no external documentation required during normal workflow
  • Convention validation CI provides actionable error messages