Rewriting Git History with an LLM for Conventional Commits

TL;DR: Feed your entire git log + file lists into a single LLM call to generate a bash hash map of conventional commit messages, then apply it with git filter-branch in seconds. 143 commits rewritten in 6 seconds, one API call, ~$0.05.

Why bother? Link to heading

Good commit messages are documentation you get for free — but only if they’re actually meaningful. My repo had months of auto: update 2026-03-01T14:00:02 from a dumb cron job, mixed with inconsistently worded agent-written messages. Running git log was useless. I couldn’t grep for feature additions, distinguish fixes from docs changes, or understand what happened on any given day without reading diffs.

Conventional commits (feat:, fix:, docs:, etc.) solve this by making history machine-filterable and human-scannable. But retroactively classifying 143 commits by hand? No thanks.

The problem Link to heading

I had 143 commits in a personal workspace repo. About half of them looked like this:

auto: update 2026-03-01T14:00:02
auto: update 2026-03-01T15:00:18
auto: update 2026-03-01T16:00:02

These came from a cron job that ran git add -A && git commit every 30 minutes with a timestamp. Functional, but useless as history. The other half were written by an AI agent during live sessions — better, but inconsistent in style and missing conventional commit prefixes. (I use OpenClaw as my AI assistant, and it makes workspace changes that get auto-committed.)

I wanted the entire history rewritten to conventional commits, and I wanted it done in one shot.

The approach Link to heading

Extract commit metadata — hash, old message, and changed files for every commit
Have an LLM write the rewrite map — given the old messages and file lists, generate proper conventional commit messages
Apply the map with git filter-branch — a bash associative array keyed by commit hash prefix, executed in seconds

Step 1: Extract the metadata Link to heading

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done

This gives you a block per commit:

COMMIT:a1b2c3d
OLD_MSG:Add user profile fields to database schema
FILES:db/migrations/003_profiles.sql,db/schema.md,
---
COMMIT:e4f5a6b
OLD_MSG:auto: update 2026-02-28T15:30:01
FILES:docs/setup.md,notes/2026-02-28.md,
---

The file list is key for the auto-commits. auto: update tells you nothing, but FILES:docs/setup.md,notes/2026-02-28.md tells you it was a documentation update.

Step 2: Generate the rewrite map Link to heading

This is a single LLM call, not one per commit. Pipe the entire output from Step 1 into your CLI of choice:

Using Claude Code Link to heading

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done | claude -p "For each commit, write a conventional commit message \
(one line, max 72 chars). Use prefixes: feat, fix, docs, refactor, chore. \
Use scopes where natural (e.g. feat(db):). For auto-update commits, infer \
the purpose from the file list and surrounding commits. Output ONLY a bash \
associative array: MAP[hash]=\"message\"" > /tmp/rewrite-map.sh

Using Codex Link to heading

TEST_INPUT=$(git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done)

cat <<EOF | codex exec - > /tmp/rewrite-map.sh
For each commit, write a conventional commit message (one line, max 72 chars).
Use prefixes: feat, fix, docs, refactor, chore.
Output ONLY a bash associative array: MAP[hash]="message".

$TEST_INPUT
EOF

Using Gemini CLI Link to heading

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done | gemini -p "For each commit, write a conventional commit message \
(one line, max 72 chars). Use prefixes: feat, fix, docs, refactor, chore. \
Output ONLY a bash associative array: MAP[hash]=\"message\"." > /tmp/rewrite-map.sh

If your distro’s /usr/bin/env is too old for env -S (common on older Ubuntu images), run Gemini via Node directly:

git log --oneline --reverse | while read hash msg; do
  echo "COMMIT:$hash"
  echo "OLD_MSG:$msg"
  echo "FILES:$(git diff-tree --no-commit-id --name-only -r $hash | tr '\n' ',')"
  echo "---"
done | node ~/.nvm/versions/node/v22.22.0/lib/node_modules/@google/gemini-cli/dist/index.js \
  -p "For each commit, write conventional messages. Output only MAP[...] lines." \
  -o text > /tmp/rewrite-map.sh

Using any LLM Link to heading

If you don’t have a CLI, paste the output into ChatGPT, Gemini, or any chat interface with the same prompt. Or call an API directly — the important thing is that all commits go in a single call.

Why one call matters Link to heading

The LLM sees all commits at once, which lets it:

Recognise sequences — 20 consecutive commits touching notes/2026-03-01.md are clearly incremental daily notes during a long build, so it can label them progressively (“build starts”, “build overnight”, “build complete 🎉”)
Infer context from neighbours — an auto: update that only touches docs/setup.md right after a migration commit is probably documenting that migration
Stay consistent — scopes and terminology stay uniform across the whole history

The output is a ready-to-use bash map:

MAP[a1b2c3d]="feat(db): add user profile fields to schema"
MAP[e4f5a6b]="docs: update setup guide and daily notes"
MAP[c7d8e9f]="docs: daily notes — build complete 🎉"
MAP[f0a1b2c]="feat: initial project setup"

143 commits fit comfortably in a single context window. For larger repos, batch in chunks of 200-300 commits and concatenate the maps — you’ll lose some cross-batch context but it still works.

I reviewed the output and hand-edited about 10% of the messages. The rest were good as-is.

Step 3: Apply with git filter-branch Link to heading

Wrap the map in a bash script that reads from $GIT_COMMIT (exposed by filter-branch):

#!/bin/bash
declare -A MAP
MAP[f0a1b2c]="feat: initial project setup"
MAP[a1b2c3d]="feat(db): add user profile fields to schema"
MAP[e4f5a6b]="docs: update setup guide and daily notes"
# ... 140 more entries ...

OLD_MSG=$(cat)
SHORT="${GIT_COMMIT:0:7}"

if [[ -n "${MAP[$SHORT]+x}" ]]; then
    echo "${MAP[$SHORT]}"
else
    echo "$OLD_MSG"
fi

Run it:

git filter-branch -f --msg-filter 'bash /tmp/rewrite-commits.sh' -- --all

143 commits rewritten in 6 seconds. Then force push:

git push --force origin main

Before and after Link to heading

Before:

9f864d1 Update config and add group settings
94e1619 auto: update 2026-03-02T13:00:01
065e6f0 auto: update 2026-03-02T11:30:01
c797696 auto: update 2026-03-01T18:00:02
88e574e Add new entries to data model

After:

f312949 feat: add group-level config overrides
6b2f5e8 docs: daily notes — search and upgrade notes
a0640b7 docs: update daily notes for 2026-03-02
c797696 docs: daily notes — Node 22 build complete 🎉
88e574e feat(data): add new entries to model

Preventing bad commits going forward Link to heading

The root cause was a dumb cron job. I replaced it with one that asks an LLM for the commit message:

#!/bin/bash
cd /path/to/workspace
git add -A
if git diff --cached --quiet; then
  exit 0
fi

DIFF=$(git diff --cached --stat)
FILES=$(git diff --cached --name-only | tr '\n' ', ' | sed 's/,$//')

# Option 1: Claude Code CLI
MSG=$(echo "Files: ${FILES}\n\nStats:\n${DIFF}" | \
  claude -p "Write a one-line conventional commit message (max 72 chars) for these changes. Output ONLY the message, nothing else." 2>/dev/null)

# Option 2: Codex CLI
# MSG=$(cat <<EOF | codex exec - 2>/dev/null
# Write a one-line conventional commit message (max 72 chars).
# Output only the message.
#
# Files: ${FILES}
# Stats:
# ${DIFF}
# EOF
# )

# Option 3: Gemini CLI
# MSG=$(echo -e "Files: ${FILES}\n\nStats:\n${DIFF}" | \
#   gemini -p "Write a one-line conventional commit message (max 72 chars). Output only the message." -o text 2>/dev/null)

# Option 4: Gemini API (free tier, no CLI needed)
# MSG=$(curl -s --max-time 15 \
#   "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GEMINI_KEY" \
#   -H "Content-Type: application/json" \
#   -d "{\"contents\":[{\"parts\":[{\"text\":\"Write a one-line conventional commit message (max 72 chars) for these changes:\\n\\nFiles: ${FILES}\\n\\nStats:\\n${DIFF}\"}]}]}" \
#   | python3 -c "import json,sys; print(json.load(sys.stdin)['candidates'][0]['content']['parts'][0]['text'].strip())" 2>/dev/null)

# Fallback if LLM call failed or message too long
if [ -z "$MSG" ] || [ ${#MSG} -gt 100 ]; then
  MSG="chore: update ${FILES}"
fi

git commit -m "$MSG"
git push origin main 2>/dev/null || true

This runs on a cron every 30 minutes. No changes = no commit. Changes = LLM writes the message. Pick whichever option matches your setup.

Cost breakdown Link to heading

The whole rewrite used:

Step	API calls	Model	Cost
Generate rewrite map	1	LLM (single prompt with all 143 commits)	~$0.02-0.10 depending on model
`git filter-branch`	0	Pure bash, no API calls	Free
Going forward (per commit)	1	Gemini 2.5 Flash	Free (free tier)

The key insight: the expensive part (understanding what each commit did) is a single batch call. The rewrite itself is pure local bash — filter-branch invokes the script 143 times, but each invocation is just a hash table lookup, no API calls.

Why this works well Link to heading

File paths are surprisingly informative. The LLM can infer intent from directory structure alone — migrations/ means schema changes, notes/ means documentation, config/ means configuration.
One call beats 143 calls. Batching all commits into a single prompt gives the LLM cross-commit context. It sees that 20 consecutive daily notes commits happened during a build, or that a config change followed a migration. Individual calls per commit would miss these patterns and cost 100x more.
git filter-branch is fast. The bash hash lookup is O(1) per commit. The bottleneck is the LLM call to generate the map, not the rewrite itself.
Conventional commits make git log useful. You can now git log --oneline --grep="^feat(data)" and get exactly the data model changes.

Caveats Link to heading

Force push rewrites history. Don’t do this on shared branches with other contributors.
Hash lookup requires 7-char prefix uniqueness. In practice this is fine for repos under ~10k commits. Use longer prefixes if paranoid.
Merge commits pass through unchanged if not in the map. Add them explicitly if needed.
Review before applying. The LLM gets ~90% right, but you’ll want to sanity-check the map. Easier to fix a text file than to re-rewrite history.

The whole thing — extracting metadata, generating the map, applying it, and force pushing — took about 15 minutes. Most of that was reviewing the LLM’s suggestions. The actual rewrite was 6 seconds.