Skip to main content

Risks and Mitigations

RiskProbabilityImpactMitigation
Teams skip templates and create resources manuallyHighMediumCI validation catches label violations; AppProject restricts allowed source repos
Crossplane provider upgrade breaks existing ClaimsMediumHighPin provider versions; test upgrades in dev before prod; deletionPolicy: Orphan prevents data loss
ArgoCD management cluster outageLowCriticalHA ArgoCD deployment; existing clusters keep running; all state is in Git
Sealed Secrets key lossLowCriticalAutomate key backup after cluster creation; document and test recovery procedure
Backstage catalog out of sync with realityMediumMediumPeriodic reconciliation scans; convention CI on every PR
Domain teams blocked waiting for platform PR reviewsMediumHighSLA: platform PRs reviewed within 4 business hours; escalation path documented
Crossplane resource provisioning fails silentlyLowHighREADY/SYNCED surfaced in Backstage; Prometheus alerts on Crossplane controller errors
Convention changes require mass updatesLowHighConvention is in templates and validated in CI — changes rolled out iteratively per domain