-3

I have special filenames with escape \ characters stored in Git repository on Debian 10 Linux.

Problem: it is not possible to git checkout files on Windows, which have incompatible characters in the filename.

Example:

git log --all --name-only -m --pretty= '*\\*'
"systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

I get following Git errors at Windows checkout:

C:\Git\bin\git.exe reset --hard "5ef1cac3a03304c35b455edf32bd1bb78060c5b9" --
error: invalid path 'systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount'
fatal: Could not reset index file to revision '5ef1cac3a03304c35b455edf32bd1bb78060c5b9'.
Done

Problem reproducing steps:

# Clone repository, to be executed on a safe repo:
git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
# Cloning into '/target/path/to/repo/clone'...
# remote: Enumerating objects: 9534, done.
# remote: Counting objects: 100% (9534/9534), done.
# remote: Compressing objects: 100% (4776/4776), done.
# remote: Total 9534 (delta 4215), reused 8043 (delta 3136), pack-reused 0
# Receiving objects: 100% (9534/9534), 7.41 MiB | 16.78 MiB/s, done.
# Resolving deltas: 100% (4215/4215), done.

cd /target/path/to/repo/clone/

# List the files with escape \ from repo history into a list file:
git log --all --name-only -m --pretty= '*\\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Remove the files with escape \ from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape \ to check result:
git log --format="reference" --name-status --diff-filter=A '*\\*'
# "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

#  Unfortunately it seems filter-repo was executed, but log still lists filenames with escape \ :-( 

Question:

1) How to remove all files from Git repo history with path having at least one escape \ character in filename?

(reason: it is not possible to checkout those files on Windows, which have incompatible characters in the filename)

UPDATE1:

Tried to replace \\x2d string to - in input file list as suggested, but git history remove was still unsuccessful:

# List the files with escape \ from repo history into a list file:
git log --all --name-only -m --pretty= '*\\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Replace \\x2d string to - in git_repo_files_w_escape.txt:
sed -i 's/\\\\x2d/-/g' /opt/git_repo_files_w_escape.txt

# Remove the listed files from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape \ to check result:
git log --format="reference" --name-status --diff-filter=A '*\\*'
# "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
# "systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

#  Unfortunately log still lists filenames with \\x2d :-(

UPDATE2:

Tried to replace \\x2d in git_repo_files_w_escape.txt to \\\\x2d or \x2d but none of them resulted to remove the files having \\x2d in filename from Git history.

UPDATE3:

I'm looking for a working solution based on git filter-repo.

Any more idea?

20
  • Colon is not backslash so what are we even talking about here? Commented Jan 17, 2023 at 17:33
  • 2
    And otherwise isn't this the same as your stackoverflow.com/questions/75112545/… ? Commented Jan 17, 2023 at 17:34
  • 2
    Also backslash of itself is not escape character. It's just a backslash. Commented Jan 17, 2023 at 17:36
  • 2
    But that doesn't make the escape backslash a character in the resulting path. It's just a way of talking to bash. Commented Jan 17, 2023 at 17:43
  • 1
    Only if one doesn't understand string escaping, perhaps. Otherwise they are identical. Commented Jan 17, 2023 at 17:44

3 Answers 3

5
+50

You fed bad input into filter-repo, based on a common but incorrect assumption about how git log works.

Look at your own output:

$ git log --format="reference" --name-status --diff-filter=A '*\\*'
"systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"

Let's look at the first line as an example. If you were to store that in a file, which you pass to --paths-from-file, then git-filter-repo is going to be looking for a file named "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount" to remove. You have no such file in your repository. Instead you have one named systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount. (Note that I have removed both " characters and two of the \ characters.)

The problem here is that you assumed git log would list filenames as-is, which it won't do whenever there are special characters. You can often get around this by setting core.quotepath=false (this particularly helps when you have non-ascii characters), but even that is ignored when you have backslashes.

Here's something that might work better for you for generating the list of filenames to exclude:

git log -z --all --name-only -m --pretty= '*\\*' | tr '\0' '\n' | sort -u >/opt/git_repo_files_w_escape.txt

but it assumes you do not have filenames with newline characters. (If you do have files with newline characters, though, then --paths-from-file won't work for you.)

Even simpler would be bypassing creating a list of files with bad names and just programatically removing them by pattern:

git filter-repo --filename-callback 'return None if b'\\' in filename else filename'
Sign up to request clarification or add additional context in comments.

1 Comment

@newren Thank you very much for pointing me to the right solution! Your solution works perfectly, it removed all files having backlash in filename. You are right, it is not a bug, just the git log result was not in the right format for input into git filter-repo.
0

fwiw, this worked on a linux system, this allowed me to rewrite the HEAD commit without having the files checked out on disk:

git ls-files | grep -a -e '\\' | while read f; do
    f=$(echo $f | sed -e 's|"||g')
    new=$(echo "$f" | sed -e 's|\\\\x2d|-|g')
    git show "@:$f" > $new
    git rm --cached "$f"
    git add "$new"
done

git status
git commit --amend

The same commands should work on git-bash for windows.

4 Comments

Thank you for your answer. But this answer rewrites only the HEAD, not the whole repo history. I'm looking for a working solution based on git filter-repo.
@klor: if you take the command as is, yes. It also provides a base for writing a set of commands that renames files containing '\\' in their names, which could give you a way to turn it into a script which you can invoke with git filter-branch for example. Unfortunately I don't have enough time to research a complete solution to your issue.
perhaps fiddling with something like regex:(.*)(\\\\x2d)(.*)=>\1-\3 (try it on a smaller repo to check the effects)
0

Assuming you have many files that you want to fix scattered in the hierarchy, a solution with git filter-repo looks tedious. You can instead use a combination of git fast-export and git fast-import to modify file names in the whole history.

git fast-export --no-data --all > exported

Now delete the file entries containing a backslash:

grep -v '^[DM] .*\\' exported > fixed

Instead of removing the files, you can also modify the file names. For example, to replace the backslash by a dash -, you could try this:

sed -e '/^[DM] /s,\\,-,g' < exported > fixed

You may now investigate the difference between the two files to ensure that no commit messages were modified:

diff -u exported fixed | less

Now attempt to import the modified history:

git fast-import < fixed

This will stop with an error that tells you that the branches will not be modified because the old branch heads are not subsets of the new heads. If there are no other errors, you can now force the modification:

git fast-import --force < fixed

1 Comment

Nah, not tedious at all with filter-repo. But, more importantly, suggesting programmatic edits of fast-export should be accompanied with big warnings. The --no-data avoids the worst problems, but you should really emphasize how important that option is to avoid folks modifying your solution for other problems where they drop that option and then corrupt their repo. Also, even with --no-data, there's a risk that you will be removing lines from commit messages and corrupting the stream. filter-repo was written in part because editing fast-export streams programatically can be risky.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.