Remove large files from git history

First Identify Space-Consuming Objects

List the largest blobs in history (sorted by size, take top N):

git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' | sort -n -k2 | tail -n 50
  • Check currently tracked paths that should be ignored:
git ls-files core/output

Remove from index and commit (while keeping files in working directory)

  • First create a backup branch as a precaution:
git branch backup/remove-big-files
  • Remove from index (keep local files intact) and commit:
git rm -r --cached --ignore-unmatch core/output
git commit -m "chore: remove generated outputs and large images from index (prepare history purge)"
  • Also add these paths to .gitignore to prevent them from being tracked again in the future.

Permanently Remove from History

  • Recommended tool: git filter-repo (faster and safer than filter-branch):
# Note: This rewrites history across all branches. Ensure you have a backup and notify collaborators.
git filter-repo --force --invert-paths --path core/output
  • git-filter-repo will display instructions upon success and may remove the origin remote (you can re-add it as needed).

Clean Up Local Garbage

git reflog expire --expire=now --all
git gc --prune=now --aggressive
git count-objects -vH

Proceed to the next step only after confirming a significant reduction in count and size.

Force Push to Remote (Ensure Backup is Done Before Forcing Push)

  • Since history has been rewritten, a force push is required. All collaborators must resynchronize or re-clone:
# If the remote was removed, re-add origin first
git remote add origin git@your.git.remote:repo.git # if needed
git push --force --all
git push --force --tags