Diagnosing ArgoCD: sync and health failures

When an ArgoCD application is "not working," the fastest path to a fix is to stop guessing and read two status fields in the right order. ArgoCD tracks every application on two independent axes: a sync status (Synced / OutOfSync) that answers "does the cluster match Git?", and a health status (Healthy / Progressing / Degraded / Suspended / Missing / Unknown) that answers "are the resulting resources actually working?" These are orthogonal — an app can be Synced but Degraded, or OutOfSync but Healthy. Decide which axis is wrong first, and you have already narrowed three different problems down to one. This guide is the triage layer; each specific failure links down to its dedicated fix page.

Why a systematic approach beats guessing

Most ArgoCD debugging time is wasted because the operator conflates the two axes — they see "the app is red" and start editing manifests when the real problem is a crash-looping pod, or they restart pods when the real problem is drift in Git. The two axes are computed by different machinery and have different fixes:

Sync status is the output of a diff: ArgoCD compares the rendered manifests from your Git source against the live objects in the cluster. Any non-ignored field that differs marks the app OutOfSync (ArgoCD diffing docs).
Health status is the output of health checks run against each live resource — for example, a Deployment is Healthy only when its observed generation matches the desired generation and updated replicas equal desired replicas (ArgoCD health docs).

Because they are computed independently, the first diagnostic question is never "what's broken?" — it is "which axis is wrong?" The triage framework below answers that in two commands.

The triage framework

Step 1 — read both axes with one command

argocd app get <app-name>

argocd app get retrieves the full application detail, including both the sync status and the health status, plus a per-resource breakdown (argocd app get reference). If you suspect ArgoCD is showing a stale view, force it to recompute against the live cluster and re-render the source:

# Re-evaluate live state against the cached target manifests
argocd app get <app-name> --refresh

# Also bust the rendered-manifest cache (Helm/Kustomize re-render)
argocd app get <app-name> --hard-refresh

--refresh updates the application data without touching the cached target manifests; --hard-refresh refreshes both the application data and the target manifest cache (argocd app get reference). Reach for --hard-refresh whenever a Helm chart or Kustomize overlay changed but ArgoCD still shows the old diff.

Step 2 — classify by reading the two fields in order

Read sync status first, then health status. This ordering matters because an OutOfSync app may not have had the failing change applied yet, so its health is describing the old version of the workload.

Sync status	Health status	What it means	Where to start
`OutOfSync`	any	Cluster does not match Git, and ArgoCD has not (or could not) reconcile it	Is auto-sync on? If yes, a sync is failing — see Step 3
`Synced`	`Degraded`	Git was applied successfully, but the live resources are not working	Runtime problem, not a GitOps problem — see degraded
`Synced`	`Progressing`	Applied and still rolling out; may settle on its own	Wait, then re-check; if it never settles, treat as `Degraded`
`Synced`	`Healthy`	Matches Git and working	Nothing to fix in ArgoCD

The single most useful instinct: Synced + Degraded means stop looking at Git. ArgoCD did its job — it applied the manifests — and the resulting Deployment, Service, or PVC is failing its own health check (ArgoCD health docs). Conversely, OutOfSync means the diff is non-empty, so the next command is a diff.

Step 3 — see exactly what differs

argocd app diff <app-name>

argocd app diff renders the difference between the target (Git) and live state. Lines prefixed - are live state, + are desired state; the command returns exit code 1 when a diff is found, 0 when there is none, and 2 on error — which makes it safe to wire into CI gates (argocd app diff reference). Note that Kubernetes Secrets are excluded from this diff, so a Secret will never be the line you see.

How to read the per-resource health roll-up

Application-level health is the worst health of the application's immediate child resources, ranked Healthy > Suspended > Progressing > Degraded > Missing/Unknown. Critically, a resource's health is calculated from information about that resource itself — it is not inherited from its children (ArgoCD health docs). So a Degraded Deployment will pull the whole app to Degraded, but you still have to open that Deployment's own resource tree (argocd app get <app> --output tree) to find the failing pod. Built-in checks worth memorising:

Deployment / ReplicaSet / StatefulSet / DaemonSet: healthy when observed generation equals desired generation and updated replicas equal desired replicas.
Service (LoadBalancer) and Ingress: healthy when status.loadBalancer.ingress is non-empty with at least one hostname or IP.
PersistentVolumeClaim: healthy when status.phase is Bound.

Source: ArgoCD health docs.

The three common failures: recognise, then fix

The triage above lands you in one of three buckets. Each has a dedicated page with the full remediation — below is only how to recognise you are in that bucket.

OutOfSync that won't reconcile

Recognise it: argocd app get shows OutOfSync, and argocd app diff shows a persistent, often repeating diff — a field flips back every reconcile cycle. The classic cause is a controller or mutating webhook rewriting the object after apply, or template functions like Helm's randAlphaNum generating fresh data each render, both of which ArgoCD lists as standard drift sources (ArgoCD diffing docs). If selfHeal is enabled, you may also see ArgoCD repeatedly re-syncing the same field, because self-heal triggers when live state deviates from Git (automated sync docs).

Full fix (ignoreDifferences, managed-fields filters, self-heal tuning): OutOfSync troubleshooting.

Sync failed: one or more objects failed to apply

Recognise it: the app stays OutOfSync, but this is an operation failure, not just drift — argocd app get surfaces a failed sync operation with an error such as a schema validation error, an admission-webhook rejection, or a PreSync/Sync hook that failed. Sync phases are strict: if a PreSync hook fails the entire sync stops, and a Sync-phase failure marks the sync as failed (sync phases docs). Sync waves compound this — ArgoCD will not progress to a later wave until the current wave's resources are synced and healthy, so a stuck early wave blocks everything behind it (sync waves docs).

Full fix (reading the operation error, hooks, waves, server-side apply): sync-failed troubleshooting.

Degraded health after a successful sync

Recognise it: argocd app get shows Synced + Degraded. Git matched, the apply succeeded, and a child resource failed its health check — most often a Deployment whose rollout never completes (image pull failure, crash loop, failing readiness probe) so updated replicas never reach desired, or a Service/Ingress whose load balancer never provisions an address (ArgoCD health docs). A genuinely stuck Progressing that never settles is diagnosed the same way and converges here.

Full fix (drilling into the failing resource, probes, events, rollout debugging): degraded troubleshooting.

For the full catalogue of ArgoCD failure pages, see the ArgoCD troubleshooting index.

Prevention and operational principles

Make selfHeal an explicit decision, not a default. With self-heal enabled, ArgoCD re-syncs when live state deviates from Git, after a default 5-second timeout (automated sync docs). That is excellent for stopping config drift, but it will fight any legitimate out-of-band change and can mask the fact that something keeps mutating your objects. If a field flaps under self-heal, fix the diff source — do not just let it re-apply forever.
Keep pruning deliberate. Automated pruning is disabled by default; ArgoCD will not delete resources that are no longer in Git unless you opt in with prune: true (automated sync docs). Turning it on without PruneLast or wave ordering is a real blast-radius risk on shared clusters.
Encode controller-owned fields once, globally. Recurring false-positive drift (HPA reordering spec.metrics, controllers rewriting fields) should be handled with ignoreDifferences or managed-fields filters rather than repeated manual syncs (ArgoCD diffing docs). Do it at the right layer: per-app for one-offs, in argocd-cm for cluster-wide rules.
Order risky rollouts with sync waves. Because ArgoCD blocks later waves until earlier ones are healthy (sync waves docs), putting CRDs and namespaces in early waves prevents the "resource type not found" class of sync failures.
Gate merges with argocd app diff --local. Running the diff against local manifests before you commit catches drift and accidental field changes at review time, exactly when they are cheapest to fix (argocd app diff reference).

The deeper operational point: ArgoCD is honestly reporting two different truths about your system, and almost every wasted debugging hour comes from acting on the wrong one. Read sync first, health second, and you turn "the app is red" into a specific, fixable failure in under a minute. When a single application's failure is actually a symptom of a cluster-wide or cross-system problem — a bad admission webhook, an image registry outage, a node-pressure cascade — that is where correlating ArgoCD state against Kubernetes events, CI history, and the originating commit pays off, and where automated, read-only investigation earns its keep.

Sources

By Intellira Engineering. AI-assisted draft, reviewed by the Intellira engineering team; claims cited inline; last verified 2026-06-02.