TL;DR: Feed your entire git log + file lists into a single LLM call to generate a bash hash map of conventional commit messages, then apply it with git filter-branch in seconds. 143 commits rewritten in 6 seconds, one API call, ~$0.05.


Why bother? Link to heading

Good commit messages are documentation you get for free — but only if they’re actually meaningful. My repo had months of auto: update 2026-03-01T14:00:02 from a dumb cron job, mixed with inconsistently worded agent-written messages. Running git log was useless. I couldn’t grep for feature additions, distinguish fixes from docs changes, or understand what happened on any given day without reading diffs.

Conventional commits (feat:, fix:, docs:, etc.) solve this by making history machine-filterable and human-scannable. But retroactively classifying 143 commits by hand? No thanks.

The problem Link to heading

I had 143 commits in a personal workspace repo. About half of them looked like this:

auto: update 2026-03-01T14:00:02
auto: update 2026-03-01T15:00:18
auto: update 2026-03-01T16:00:02

These came from a cron job that ran git add -A && git commit every 30 minutes with a timestamp. Functional, but useless as history. The other half were written by an AI agent during live sessions — better, but inconsistent in style and missing conventional commit prefixes. (I use OpenClaw as my AI assistant, and it makes workspace changes that get auto-committed.)

I wanted the entire history rewritten to conventional commits, and I wanted it done in one shot.

The approach Link to heading

  1. Extract commit metadata — hash, old message, and changed files for every commit
  2. Have an LLM write the rewrite map — given the old messages and file lists, generate proper conventional commit messages
  3. Apply the map with git filter-branch — a bash associative array keyed by commit hash prefix, executed in seconds

Step 1: Extract the metadata Link to heading

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done

This gives you a block per commit:

COMMIT:a1b2c3d
OLD_MSG:Add user profile fields to database schema
FILES:db/migrations/003_profiles.sql,db/schema.md,
---
COMMIT:e4f5a6b
OLD_MSG:auto: update 2026-02-28T15:30:01
FILES:docs/setup.md,notes/2026-02-28.md,
---

The file list is key for the auto-commits. auto: update tells you nothing, but FILES:docs/setup.md,notes/2026-02-28.md tells you it was a documentation update.

Step 2: Generate the rewrite map Link to heading

This is a single LLM call, not one per commit. Pipe the entire output from Step 1 into your CLI of choice:

Using Claude Code Link to heading

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done | claude -p "For each commit, write a conventional commit message \
(one line, max 72 chars). Use prefixes: feat, fix, docs, refactor, chore. \
Use scopes where natural (e.g. feat(db):). For auto-update commits, infer \
the purpose from the file list and surrounding commits. Output ONLY a bash \
associative array: MAP[hash]=\"message\"" > /tmp/rewrite-map.sh

Using Codex Link to heading

TEST_INPUT=$(git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done)

cat <<EOF | codex exec - > /tmp/rewrite-map.sh
For each commit, write a conventional commit message (one line, max 72 chars).
Use prefixes: feat, fix, docs, refactor, chore.
Output ONLY a bash associative array: MAP[hash]="message".

$TEST_INPUT
EOF

Using Gemini CLI Link to heading

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done | gemini -p "For each commit, write a conventional commit message \
(one line, max 72 chars). Use prefixes: feat, fix, docs, refactor, chore. \
Output ONLY a bash associative array: MAP[hash]=\"message\"." > /tmp/rewrite-map.sh

If your distro’s /usr/bin/env is too old for env -S (common on older Ubuntu images), run Gemini via Node directly:

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done | node ~/.nvm/versions/node/v22.22.0/lib/node_modules/@google/gemini-cli/dist/index.js \
  -p "For each commit, write conventional messages. Output only MAP[...] lines." \
  -o text > /tmp/rewrite-map.sh

Using any LLM Link to heading

If you don’t have a CLI, paste the output into ChatGPT, Gemini, or any chat interface with the same prompt. Or call an API directly — the important thing is that all commits go in a single call.

Why one call matters Link to heading

The LLM sees all commits at once, which lets it:

  • Recognise sequences — 20 consecutive commits touching notes/2026-03-01.md are clearly incremental daily notes during a long build, so it can label them progressively (“build starts”, “build overnight”, “build complete 🎉”)
  • Infer context from neighbours — an auto: update that only touches docs/setup.md right after a migration commit is probably documenting that migration
  • Stay consistent — scopes and terminology stay uniform across the whole history

The output is a ready-to-use bash map:

MAP[a1b2c3d]="feat(db): add user profile fields to schema"
MAP[e4f5a6b]="docs: update setup guide and daily notes"
MAP[c7d8e9f]="docs: daily notes — build complete 🎉"
MAP[f0a1b2c]="feat: initial project setup"

143 commits fit comfortably in a single context window. For larger repos, batch in chunks of 200-300 commits and concatenate the maps — you’ll lose some cross-batch context but it still works.

I reviewed the output and hand-edited about 10% of the messages. The rest were good as-is.

Step 3: Apply with git filter-branch Link to heading

Wrap the map in a bash script that reads from $GIT_COMMIT (exposed by filter-branch):

#!/bin/bash
declare -A MAP
MAP[f0a1b2c]="feat: initial project setup"
MAP[a1b2c3d]="feat(db): add user profile fields to schema"
MAP[e4f5a6b]="docs: update setup guide and daily notes"
# ... 140 more entries ...

OLD_MSG=$(cat)
SHORT="${GIT_COMMIT:0:7}"

if [[ -n "${MAP[$SHORT]+x}" ]]; then
    echo "${MAP[$SHORT]}"
else
    echo "$OLD_MSG"
fi

Run it:

git filter-branch -f --msg-filter 'bash /tmp/rewrite-commits.sh' -- --all

143 commits rewritten in 6 seconds. Then force push:

git push --force origin main

Before and after Link to heading

Before:

9f864d1 Update config and add group settings
94e1619 auto: update 2026-03-02T13:00:01
065e6f0 auto: update 2026-03-02T11:30:01
c797696 auto: update 2026-03-01T18:00:02
88e574e Add new entries to data model

After:

f312949 feat: add group-level config overrides
6b2f5e8 docs: daily notes — search and upgrade notes
a0640b7 docs: update daily notes for 2026-03-02
c797696 docs: daily notes — Node 22 build complete 🎉
88e574e feat(data): add new entries to model

Preventing bad commits going forward Link to heading

The root cause was a dumb cron job. I replaced it with one that asks an LLM for the commit message:

#!/bin/bash
cd /path/to/workspace
git add -A
if git diff --cached --quiet; then
  exit 0
fi

DIFF=$(git diff --cached --stat)
FILES=$(git diff --cached --name-only | tr '\n' ', ' | sed 's/,$//')

# Option 1: Claude Code CLI
MSG=$(echo "Files: ${FILES}\n\nStats:\n${DIFF}" | \
  claude -p "Write a one-line conventional commit message (max 72 chars) for these changes. Output ONLY the message, nothing else." 2>/dev/null)

# Option 2: Codex CLI
# MSG=$(cat <<EOF | codex exec - 2>/dev/null
# Write a one-line conventional commit message (max 72 chars).
# Output only the message.
#
# Files: ${FILES}
# Stats:
# ${DIFF}
# EOF
# )

# Option 3: Gemini CLI
# MSG=$(echo -e "Files: ${FILES}\n\nStats:\n${DIFF}" | \
#   gemini -p "Write a one-line conventional commit message (max 72 chars). Output only the message." -o text 2>/dev/null)

# Option 4: Gemini API (free tier, no CLI needed)
# MSG=$(curl -s --max-time 15 \
#   "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GEMINI_KEY" \
#   -H "Content-Type: application/json" \
#   -d "{\"contents\":[{\"parts\":[{\"text\":\"Write a one-line conventional commit message (max 72 chars) for these changes:\\n\\nFiles: ${FILES}\\n\\nStats:\\n${DIFF}\"}]}]}" \
#   | python3 -c "import json,sys; print(json.load(sys.stdin)['candidates'][0]['content']['parts'][0]['text'].strip())" 2>/dev/null)

# Fallback if LLM call failed or message too long
if [ -z "$MSG" ] || [ ${#MSG} -gt 100 ]; then
  MSG="chore: update ${FILES}"
fi

git commit -m "$MSG"
git push origin main 2>/dev/null || true

This runs on a cron every 30 minutes. No changes = no commit. Changes = LLM writes the message. Pick whichever option matches your setup.

Cost breakdown Link to heading

The whole rewrite used:

StepAPI callsModelCost
Generate rewrite map1LLM (single prompt with all 143 commits)~$0.02-0.10 depending on model
git filter-branch0Pure bash, no API callsFree
Going forward (per commit)1Gemini 2.5 FlashFree (free tier)

The key insight: the expensive part (understanding what each commit did) is a single batch call. The rewrite itself is pure local bash — filter-branch invokes the script 143 times, but each invocation is just a hash table lookup, no API calls.

Why this works well Link to heading

  • File paths are surprisingly informative. The LLM can infer intent from directory structure alone — migrations/ means schema changes, notes/ means documentation, config/ means configuration.
  • One call beats 143 calls. Batching all commits into a single prompt gives the LLM cross-commit context. It sees that 20 consecutive daily notes commits happened during a build, or that a config change followed a migration. Individual calls per commit would miss these patterns and cost 100x more.
  • git filter-branch is fast. The bash hash lookup is O(1) per commit. The bottleneck is the LLM call to generate the map, not the rewrite itself.
  • Conventional commits make git log useful. You can now git log --oneline --grep="^feat(data)" and get exactly the data model changes.

Caveats Link to heading

  • Force push rewrites history. Don’t do this on shared branches with other contributors.
  • Hash lookup requires 7-char prefix uniqueness. In practice this is fine for repos under ~10k commits. Use longer prefixes if paranoid.
  • Merge commits pass through unchanged if not in the map. Add them explicitly if needed.
  • Review before applying. The LLM gets ~90% right, but you’ll want to sanity-check the map. Easier to fix a text file than to re-rewrite history.

The whole thing — extracting metadata, generating the map, applying it, and force pushing — took about 15 minutes. Most of that was reviewing the LLM’s suggestions. The actual rewrite was 6 seconds.