Skip to content
Intellira
Jenkinsmedium severity

BuildFailure

A Jenkins build that fails at a stage needs the right log, not the whole console. Find the failing stage, tell FAILURE from UNSTABLE, and trace the change.

Written by Intellira Engineering, Editorial team

What a build failure means

Jenkins reports several distinct end states, and they are not interchangeable. A red FAILURE means a step exited non-zero and threw an exception. A yellow UNSTABLE means the run completed but a quality signal (usually a failing test) was recorded — execution proceeds by default even when the build is unstable. ABORTED means a timeout or manual cancel interrupted the run. Knowing which one you have tells you where to look before you read a single log line.

The useful signal is which stage failed and that stage's log — not the whole console, and not the parent pipeline log when a sub-job is the real culprit.

Diagnose it

Identify the entity type first. A pipeline stage is read from the parent build; a sub-job (e.g. cd-step-preparation #5819) must be read from that sub-job's own build. Use the Pipeline REST API to jump straight to the failing node instead of scrolling the full console:

# Stage-level status for a run (which stage is red / yellow):
curl -s "$JENKINS/job/<job>/<build>/wfapi/describe"

# That node's log only (node id from the describe output above):
curl -s "$JENKINS/job/<job>/<build>/execution/node/<id>/wfapi/log"

# Whole console as a fallback:
curl -s "$JENKINS/job/<job>/<build>/consoleText" | tail -100

# Test results when the run is UNSTABLE rather than FAILURE:
#   <job>/<build>/testReport

The wfapi/describe endpoint returns each stage and links to its per-node log (Pipeline REST API plugin).

Causes — diagnose and fix each

1. Test failure (UNSTABLE, not FAILURE)

A failing test recorded by junit marks the run UNSTABLE (yellow), distinct from FAILED (red).

  • Diagnose: the run is yellow; the shell step that ran the tests still exited zero. Open <build>/testReport for the failed cases.
  • Fix: fix the test or the code under test. If later stages must not run once the build is unstable, set options { skipStagesAfterUnstable() }, and act in the unstable post condition — not failure.

2. Compile / step error (FAILURE)

A script that exits non-zero causes the step to fail with an exception; this is the red, most common case.

  • Diagnose: the stage log shows the compiler or command error directly.
  • Fix: correct the code or command and re-run.

3. Swallowed failure via returnStatus / catchError

A sh step with returnStatus: true returns the status code instead of throwing, and catchError sets the result to UNSTABLE and continues. Either can hide a real failure.

  • Diagnose: a step that clearly errored did not fail the stage; check the Jenkinsfile for returnStatus, catchError, or a swallowed exit code.
  • Fix: branch on the returned status and fail explicitly, or remove the catchError wrapper so the non-zero exit propagates.

4. Out of memory — exit code 137 (node-level)

If the host runs short on memory the kernel OOM killer can terminate the process; on Linux you see exit code 137 (128 + SIGKILL). This is a node/agent resource limit, not a code bug.

  • Diagnose: the log ends abruptly with 137 (or Killed), often mid-test or mid-build with no application error.
  • Fix: raise the agent/container memory limit or the JVM/tool heap, or reduce parallelism on that agent. Confirm by checking the agent's dmesg/OOM logs.

5. Agent disconnection (infrastructure, not code)

A controller-to-agent channel drop fails the running step with errors such as channel closed or Backing channel ... is disconnected. Pipeline builds can often survive a brief reconnect, but a dropped agent still aborts in-flight steps (durable-task / nodes-and-processes plugin).

  • Diagnose: the failure message is about the connection/agent, not your build command; ephemeral (e.g. Kubernetes) agents may have been evicted or terminated.
  • Fix: wrap the flaky stage in retry and timeout options so a fresh agent is allocated, and address the underlying node capacity/eviction.

6. Missing or changed dependency

A version bump or an unavailable artifact breaks a stage that worked before.

  • Diagnose: the stage log shows a resolve/download/version error; diff the lockfile or pinned versions against the last green build.
  • Fix: pin or restore the working version, or repair the registry/proxy.

7. Environment / agent drift

A tool version or credential changed on the agent since the last green run.

  • Diagnose: the command exists but behaves differently, or auth now fails; the change is on the agent, not in the repo.
  • Fix: restore the expected tool version/credential, or label the job to a known-good agent.

8. Flaky infrastructure

A transient network or registry error that a retry resolves.

  • Diagnose: the same run passes on re-run with no code change.
  • Fix: wrap the unreliable step in retry(n) and stabilize the dependency.

Fix it

  1. Read the end state first: red = FAILURE, yellow = UNSTABLE, grey = ABORTED. That alone narrows the cause list above.
  2. Find the failing stage from the stage view (or wfapi/describe); open that stage's per-node log — for a sub-job, read the sub-job's build directly.
  3. Match the log to a cause above and apply its fix.
  4. Diff against the last green build: triggering commit, dependency, agent change.
  5. Re-run. If it only fails intermittently, treat it as flaky and wrap it in retry/timeout rather than re-running by hand.

How Intellira diagnoses this

Intellira reads the build info, the run's end state, the correct stage/sub-job logs and test results from the Jenkins MCP server, isolates the failing step, and ties it to the triggering commit — so you get "stage unit-tests is UNSTABLE on commit a1f9c2e, three assertions failed" or "stage build exited 137 on the agent" instead of a wall of console output.

Sources

By Intellira Engineering. AI-assisted draft, reviewed by the Intellira engineering team; claims cited inline; last verified 2026-06-02.

Frequently asked questions

How do I find which stage failed?
Open the pipeline run's stage view or Blue Ocean — the red stage is the failure. Then read that stage's log specifically, not the whole console. For sub-jobs, retrieve the sub-job's build log directly rather than the parent pipeline log.
The build failed but the code looks fine — what changed?
Compare against the last green build: the triggering commit, a dependency version, or an environment/agent change. A build that "suddenly" fails almost always follows a change somewhere in that set.
Why is my build yellow (UNSTABLE) instead of red (FAILURE)?
UNSTABLE usually means tests failed or a quality gate was breached while every shell step still exited zero. By default the pipeline keeps running through later stages. Use the testReport and the unstable post condition, not the failure one.

Related errors

Find the root cause of BuildFailure on your stack

Connect read-only and Intellira correlates the change behind it across Bitbucket, Jenkins, ArgoCD and Kubernetes — with the evidence to prove it.