What Degraded means
ArgoCD reports two independent things: sync status (does live match Git?) and
health status (are the resources healthy?). Degraded is the health side — the
manifests applied, but a resource isn't healthy. Re-syncing rarely helps; you fix the
workload.
ArgoCD assigns each resource one of six health statuses — Healthy, Progressing,
Degraded, Suspended, Missing, Unknown — and the application health is the
worst health of its immediate child resources, using the priority order
Healthy > Suspended > Progressing > Missing > Degraded > Unknown
(ArgoCD: Resource Health).
So a single Degraded child drags the whole app to Degraded — your first job is to find
which resource, then why.
Diagnose it
argocd app get <app-name>
# HEALTH column per resource — find the one reporting Degraded.
argocd app resources <app-name> --output tree=detailed
# tree view shows child resources (pods under a Deployment, etc.)
kubectl get pods -n <namespace> -l app=<app>
kubectl describe deploy <deploy> -n <namespace> # read the Conditions block
If the app is Degraded but no resource looks Degraded, force a hard refresh — the controller can serve a stale cached status:
argocd app get <app-name> --hard-refresh
Causes, each end to end
1. Deployment exceeded its progress deadline (most common)
ArgoCD's built-in Deployment health check passes only when the observed generation
matches desired and updated replicas reach desired replicas
(Resource Health).
While replicas come up the resource is Progressing. If the Deployment can't make
progress within progressDeadlineSeconds (default 600s / 10 min), Kubernetes sets
a Progressing condition with status: False, reason: ProgressDeadlineExceeded, and
ArgoCD maps that to Degraded
(Kubernetes: Failed Deployment).
Diagnose — read the Deployment conditions:
kubectl get deploy <deploy> -n <namespace> \
-o jsonpath='{range .status.conditions[*]}{.type}={.status} {.reason}{"\n"}{end}'
# Progressing=False ProgressDeadlineExceeded -> this mechanism
ProgressDeadlineExceeded only tells you progress stalled — it doesn't say why. The
underlying pod problem is one of the cases below.
2. Pods crash-looping or OOMKilled
The pods start but die, so updated replicas never become ready.
Diagnose:
kubectl get pods -n <namespace> -l app=<app> # CrashLoopBackOff / Error
kubectl describe pod <pod> -n <namespace> # Last State: OOMKilled? exit code?
kubectl logs <pod> -n <namespace> --previous # crash output before restart
Fix: if OOMKilled, raise the container memory limit or fix the leak; if a startup
crash, fix the bad config/image/env that the logs name.
3. Image pull failure (ImagePullBackOff / ErrImagePull)
Pods can't pull the image — wrong tag, private registry without imagePullSecrets, or
registry auth expiry. Pods never start, so the Deployment hits its progress deadline
and goes Degraded.
Diagnose:
kubectl describe pod <pod> -n <namespace> # Events: Failed to pull image ... <reason>
Fix: correct the image tag/digest, attach a valid imagePullSecret, or refresh
registry credentials.
4. Readiness probe never passes
Pods run but the readiness probe keeps failing, so they never count as ready and the rollout stalls past the deadline.
Diagnose:
kubectl describe pod <pod> -n <namespace> # Events: Readiness probe failed: ...
Fix: correct the probe path/port, raise initialDelaySeconds/failureThreshold,
or fix the app endpoint the probe targets.
5. Pods can't be scheduled (quota, resources, PDB)
Pods stay Pending because the cluster can't place them: insufficient CPU/memory,
exhausted ResourceQuota/LimitRange, no node matching node-selectors/affinity, or a
restrictive PodDisruptionBudget blocking the rollout
(Kubernetes: Failed Deployment).
Diagnose:
kubectl describe pod <pod> -n <namespace> # Events: FailedScheduling / exceeded quota
kubectl get resourcequota -n <namespace>
Fix: lower requests, raise the quota, fix the affinity/selector, or relax the PDB so old pods can be evicted during the roll.
6. Argo Rollouts: aborted or Degraded (paused is not Degraded)
ArgoCD ships a bundled health check for argoproj.io/Rollout. It maps the Rollout's
status.phase: Degraded to Degraded, and status.abort: true (a failed analysis
run aborting the canary) to Degraded with message "Rollout was aborted". A
paused canary maps to Suspended, not Degraded — don't confuse a deliberate pause
with a failure
(ArgoCD Rollout health.lua,
Argo Rollouts FAQ).
Diagnose:
kubectl argo rollouts get rollout <name> -n <namespace>
# look for Degraded phase, or Status: ✖ Degraded / aborted analysis run
Fix: inspect the failing AnalysisRun, fix the new version, then promote or
deliberately abort/roll back — don't just re-sync.
7. Built-in checks for Service / Ingress / PVC
ArgoCD also marks these Degraded on their own conditions: a LoadBalancer Service or
Ingress whose status.loadBalancer.ingress is empty (no external IP/hostname
provisioned), or a PersistentVolumeClaim whose phase is not Bound
(Resource Health).
Diagnose & fix: kubectl describe svc|ingress|pvc <name> -n <namespace> — usually a
missing cloud LB controller / ingress controller, or no provisioner / storage class for
the PVC.
8. Failed Sync/PostSync hook
A Sync or PostSync hook resource (e.g. a migration Job) that fails reports Degraded,
and the app inherits it.
Diagnose:
argocd app get <app-name> # the hook resource shows Degraded
kubectl logs job/<hook-job> -n <namespace>
Fix: correct the hook job, then re-run the sync.
9. Custom CRD health check returns Degraded
For a CRD with a custom Lua health check (defined under
resource.customizations.health.<group>_<kind> in argocd-cm), the script reads the
resource's status and can set Degraded — for example when a Ready condition is
False
(Custom Health Checks).
Diagnose: read the CRD's own status.conditions (kubectl get <crd> <name> -o yaml)
and the operator that owns it. The Lua script just mirrors what that controller reports.
10. Lua health check erroring out → Unknown (not Degraded)
Worth knowing because it looks similar but isn't: if a custom Lua script throws (e.g.
it dereferences a nil status field), the resource goes Unknown, not Degraded —
and Unknown is the lowest priority, so it dominates app health too.
Diagnose:
kubectl logs -n argocd deployment/argocd-application-controller | grep -i lua
Fix: harden the script for nil values; this is a config bug in the health check, not a workload failure (Resource Health).
How Intellira diagnoses this
Intellira reads the per-resource health from the ArgoCD MCP server, follows the
Degraded resource into its pods via the Kubernetes connector, and reads the actual
Deployment conditions and pod events — so it can distinguish a ProgressDeadlineExceeded
caused by OOMKilled from an ImagePullBackOff or an aborted Rollout, and tie it to the
change that introduced it, with evidence.
Sources
- ArgoCD — Resource Health (statuses, built-in checks, app roll-up, custom Lua)
- ArgoCD — Custom Health Checks
- ArgoCD — bundled Rollout health.lua (aborted/Degraded/paused mapping)
- Kubernetes — Failed Deployment & progressDeadlineSeconds
- Argo Rollouts — FAQ (ArgoCD health integration)
By Intellira Engineering. AI-assisted draft, reviewed by the Intellira engineering team; claims cited inline; last verified 2026-06-02.