How a tidy refactor caused a release loop

TL;DR Link to heading

A cleanup PR removed an accidental guard in our release workflow’s git push logic. Three weeks later the workflow had pushed 119 self-generated bump commits. Fix: gate the release job on github.actor so the workflow’s own bot pushes can’t re-trigger it.

I opened a service repo and found a release pull request with 119 consecutive commits, all chore: Bump version 26.05.09-N -> N+1. No real changes between them. The PR had been open for three weeks and the version counter climbed on its own, several bumps a day, no human or scheduler involved.

How the runner threw the bump away Link to heading

The release workflow ran on every push to staging. Its last job called a shared composite action with three steps:

bumpver update to bump the version and commit it locally
open a ticket in our issue tracker
create a release PR (staging to main), or edit the open one if it already exists

The composite action used to look like this:

- name: Create PR to ${{ inputs.target-ref }}
  shell: bash
  run: |
    STATUS=$(gh pr status --json state --jq '.currentBranch.state')
    TITLE="build: Release ${{ inputs.repository-name }}"
    if [ "$STATUS" == 'OPEN' ]; then
      gh pr edit --title "$TITLE" --body=""
    else
      git pull --rebase origin "$(git branch --show-current)"
      git push
      gh pr create --base ${{ inputs.target-ref }} --title "$TITLE" --body=""
    fi

bumpver had commit = true, push = false. The bump landed on the runner’s working copy, and git push only ran when creating a new PR. With a release PR already open, the runner created the bump commit and threw it away on shutdown.

That was the loop-breaker. Nobody had written it as a guard. It fell out of a duplication: pull and push ran in both branches of the if, in different orders, and the OPEN branch happened not to call them.

The workflow runs as a GitHub App, not via GITHUB_TOKEN. GitHub’s built-in “pushes from GITHUB_TOKEN don’t re-trigger workflows” rule didn’t apply, so any push from the action would trigger the next run.

Two clean diffs that compose into a loop Link to heading

Someone noticed a real bug: the open PR’s diff showed the old version because the runner threw the bump away every time. They fixed it by pushing the bump in both branches:

   if [ "$STATUS" == 'OPEN' ]; then
+    git pull --rebase origin "$(git branch --show-current)"
+    git push
     gh pr edit --title "$TITLE" --body=""
   else
     git pull --rebase origin "$(git branch --show-current)"
     git push

A follow-up “tidy” hoisted the now-duplicate pull/push out of the if/else:

     STATUS=$(gh pr status --json state --jq '.currentBranch.state')
     TITLE="build: Release ${{ inputs.repository-name }}"
+    git pull --rebase origin "$(git branch --show-current)"
+    git push
     if [ "$STATUS" == 'OPEN' ]; then
-      git pull --rebase origin "$(git branch --show-current)"
-      git push
       gh pr edit --title "$TITLE" --body=""
     else
-      git pull --rebase origin "$(git branch --show-current)"
-      git push
       gh pr create --base ${{ inputs.target-ref }} --title "$TITLE" --body=""
     fi

Either diff reads fine in isolation. Composed, they sent every push to staging through a bumpver commit that pushed itself back to staging, which triggered the workflow again.

The build and test jobs gated each release step, so the loop crawled at CI speed instead of API speed. Without that throttle the counter would have hit thousands instead of 119.

The fix Link to heading

I made the guard explicit. The bot pushes the bump as app-name[bot], so github.actor on the recursive run is the bot. Gate the release job on actor:

release:
  if: |
    github.ref_name == 'staging' &&
    github.actor != 'app-name[bot]'

Bot-authored pushes still trigger the workflow, but the release job no-ops and the loop ends.

Lessons Link to heading

The bug shipped because the loop-breaking property was invisible in the diff, undocumented, and you could only find it by tracing the full event chain. The “redundant” code was load-bearing. Two things help:

Comment the load-bearing line. A comment above the OPEN branch saying “do not push here: pushing re-triggers this workflow” would have stopped the refactor.
Make the guard explicit. A github.actor != ...[bot] check sits in the workflow file. The next person who wants to remove it has to do it on purpose.

If a workflow step pushes to the same branch that triggered it, write a re-entry guard someone else can read.

TL;DR Link to heading

How the runner threw the bump away Link to heading

Two clean diffs that compose into a loop Link to heading

The fix Link to heading

Lessons Link to heading

Further reading Link to heading