Summary
First fix your local history. You have several options that vary in ease of use depending on how gnarly your history is between HEAD and the commit with the accidental rip.
git reset --soft
git rebase --interactive
git commit-tree
git filter-repo
git filter-branch (tend to avoid this one)
If you pushed the history with the rip, you may need to fix history on a shared repository (deleting and re-pushing a branch or git push --force), and your collaborators will have to realign their work with the rewritten history.
You may also find “Removing sensitive data from a repository” from GitHub to be a helpful resource.
The Setup
I will illustrate possible fixes using concrete example history that simulates a simple representative sequence of
- add
index.html
- add
site.css and oops.iso
- add
site.js and delete oops.iso
To recreate the exact SHA-1 hashes from this example in your setup, first set a couple of environment variables. If you’re using bash
export GIT_AUTHOR_DATE="Mon Oct 29 10:15:31 2018 +0900"
export GIT_COMMITTER_DATE="${GIT_AUTHOR_DATE}"
If you’re running in the Windows command shell
set GIT_AUTHOR_DATE=Mon Oct 29 10:15:31 2018 +0900
set GIT_COMMITTER_DATE=%GIT_AUTHOR_DATE%
Then run the code below. To get back to the same starting point after experimenting, delete the repository, and rerun the code.
#! /usr/bin/env perl
use strict;
use warnings;
use Fcntl;
sub touch { sysopen FH, $_, O_WRONLY|O_CREAT and close FH or die "$0: touch $_: $!" for @_; 1 }
my $repo = 'website-project';
mkdir $repo or die "$0: mkdir: $!";
chdir $repo or die "$0: chdir: $!";
system(q/git init --initial-branch=main --quiet/) == 0 or die "git init failed";
system(q/git config user.name 'Git User'/) == 0 or die "user.name failed";
system(q/git config user.email '[email protected]'/) == 0 or die "user.email failed";
# for browsing history - http://blog.kfish.org/2010/04/git-lola.html
system "git config alias.lol 'log --graph --decorate --pretty=oneline --abbrev-commit'";
system "git config alias.lola 'log --graph --decorate --pretty=oneline --abbrev-commit --all'";
my($index,$oops,$css,$js) = qw/ index.html oops.iso site.css site.js /;
touch $index or die "touch: $!";
system("git add .") == 0 or die "A: add failed\n";
system("git commit -m A") == 0 or die "A: commit failed\n";
touch $oops, $css or die "touch: $!";
system("git add .") == 0 or die "B: add failed\n";
system("git commit -m B") == 0 or die "B: commit failed\n";
unlink $oops or die "C: unlink: $!"; touch $js or die "C: touch: $!";
system("git add .") == 0 or die "C: add failed\n";
system("git commit -a -m C") == 0 or die "C: commit failed\n";
system("git lol --name-status --no-renames");
The output shows that the repository’s structure is
* 1982cb8 (HEAD -> main) C
| D oops.iso
| A site.js
* 6e90708 B
| A oops.iso
| A site.css
* d29f991 A
A index.html
Notes
- The
--no-renames option to git lol is there to disable rename detection so that git doesn’t see deleting one empty file and adding another as a rename. You won’t need it most of the time.
- Likewise, when you’re done messing around with this example repository, remember to delete the
GIT_AUTHOR_DATE and GIT_COMMITTER_DATE environment variables or just exit the shell that you were using to follow along.
- Consider preventing future accidental pickup of DVD rips by updating your
.gitignore.
The Easy Case
If you haven’t yet published your history, then you can fix it and be done. Several approaches will do what you want.
git reset --soft
To keep everything (file contents and commit messages) except the rip, first move HEAD back to the commit immediately before the one with the DVD rip and pretend you did it correctly the first time.
git reset --soft d29f991
The exact invocation will depend on your local history. In this particular case, you could soft reset to HEAD~2 but blindly parroting this will produce confusing results when your history has different shape.
After that add the files you want to keep. The soft reset left the files in your working tree and index untouched, so oops.iso will be gone.
git add site.css site.js
You may be able to get away with git add ., particularly if you updated your .gitignore. That is what probably got you into trouble in the first place, so just in case, run git status first and then
git commit -q -C ORIG_HEAD
The soft reset keeps a “bookmark” at ORIG_HEAD, so -C ORIG_HEAD uses its commit message.
Running git lol --name-status --no-renames from here gives
* a19013d (HEAD -> main) C
| A site.css
| A site.js
* d29f991 A
A index.html
git rebase --interactive
To accomplish the same as above but guiding git along, use interactive rebase.
git rebase --interactive d29f991
You will then see an editor with
pick 6e90708 B
pick 1982cb8 C
# Rebase d29f991..1982cb8 onto d29f991 (2 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup [-C | -c] <commit> = like "squash" but keep only the previous
# commit's log message, unless -C is used, in which case
# keep only this commit's message; -c is same as -C but
# opens the editor
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# . create a merge commit using the original merge commit's
# . message (or the oneline, if no original merge commit was
# . specified); use -c <commit> to reword the commit message
Change pick to squash on the C line. Remember: with interactive rebase, you always “squash upward,” never downward.
As the helpful comments below indicate, you can change the command for the B line to reword and edit the commit message right there if it’s simple. Otherwise, save and quit the editor to get another editor for the commit message of the result of squashing B and C.
git commit-tree
You might be tempted to do it with git rebase --onto, but this is not the equivalent of a squash. In particular, if the commit in which you accidentally added the rip also contains other work that you do want to keep, the rebase will replay only the commits after it, so site.css would not come along for the ride.
Impress your friends at parties by performing a squash with git plumbing.
git reset --soft d29f991
git merge --ff-only \
$(git commit-tree 1982cb8^{tree} -p d29f991 \
-F <(git log --format=%s -n 1 1982cb8))
Afterward, the history is identical to the others.
* a19013d (HEAD -> main) C
| A site.css
| A site.js
* d29f991 A
A index.html
In English, the commands above create a new commit whose tree is identical to what you got after deleting the rip (1982cb8^{tree} in this case) but whose parent is d29f991, and then fast-forward your current branch to that new commit.
Note that in actual usage, you will likely want a pretty format of %B for the whole body of the commit message rather than just %s for its subject.
git filter-repo
The command below removes oop.iso anywhere it shows up in your history.
Create a fresh clone of your repository and cd into its root. The illustration repository won’t look like a fresh clone, so we have to add the --force option to the command below.
git filter-repo --invert-paths --path oops.iso
The resulting history is
* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
* d29f991 A
A index.html
The Hard Case
If you did run git push, then you can do one of the above, but you need to rewrite history.
You will need to either run git push with the --force option to overwrite the branch on your remote or delete the branch and push it again. Either of these options may require assistance from your remote repository’s owner or administrator.
This is unfortunately highly disruptive to your collaborators. See “Recovering From Upstream Rebase” in the git rebase documentation for the necessary steps that everyone else will have to do after repairing history.
git filter-branch (Don’t use this!)
This legacy command is kept around for historical reason, but it’s slow and tricky to use correctly. Go this route as a last resort only.
I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository.
Executing the following command
git filter-branch --prune-empty -d /dev/shm/scratch \
--index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
--tag-name-filter cat -- --all
will produce output of
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
Proceeding with filter-branch...
Rewrite 6e907087c76e33fdabe329da7e0faebde165f2c2 (2/3) (0 seconds passed, remaining 0 predicted) rm 'oops.iso'
Rewrite 1982cb83f26aa3a66f8d9aa61d2ad08a61d3afd8 (3/3) (0 seconds passed, remaining 0 predicted)
Ref 'refs/heads/main' was rewritten
The meanings of the various options are:
--prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
-d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution.
--index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.
--tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
-- specifies the end of options to git filter-branch
--all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.
After some churning, the history is now:
* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
| * 1982cb8 (refs/original/refs/heads/main) C
| | D oops.iso
| | A site.js
| * 6e90708 B
|/
| A oops.iso
| A site.css
* d29f991 A
A index.html
Notice that the new B commit adds only site.css and that the new C commit only adds site.js. The branch labeled refs/original/refs/heads/main contains your original commits in case you made a mistake. To remove it, follow the steps in “Checklist for Shrinking a Repository.”
$ git update-ref -d refs/original/refs/heads/main
$ git reflog expire --expire=now --all
$ git gc --prune=now
For a simpler alternative, clone the repository to discard the unwanted bits.
$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo
Using a file:///... clone URL copies objects rather than creating hardlinks only.
Now your history is:
* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
* d29f991 A
A index.html
git filter-repo. You should not longer usegit filter-branchas it is very slow and often difficult to use.git filter-repois around 100 times faster.