1

A bit of back story, we maintain a submission system that allows students to submit source files to a git repository. There are two options for doing this: for the more advanced students we simply let them use git. For the beginner students, we have a web interface that allows them to upload files to their repository.

The web interface itself is pretty basic, and right now only supports adding files. We would like to also give the students the ability to delete, however, we need to do the delete on the bare repository without cloning. The clone operation is too expensive and requires too much space considering we have hundreds of repositories the submission system interacts with.

We've been able to figure out how to add files directly to the tree without cloning. We haven't been able to figure out the delete part in a bare repo. I tried the following.

rm objects/70/574e5c0d5f1fb820f66fd3fd3a3c0c4ed398bb # blob id of file to be removed
git write-tree # copying output
echo "removing file" | git commit-tree <copied id from previous command> -p <previous HEAD> # copying ouput
git update-ref refs/heads/master <copied id from previous command>

Technically this works, it just removes all the files from the repo which isn't exactly what we want. I'm not exactly sure based on the internals of git how to remove a singular blob from the tree and update the bare repo, keeping the other files.

Any ideas?

1 Answer 1

2

I think I have found a solution, I don't particularly like it, but it works.

  1. Using git log, get the sha1 id of the current HEAD.
  2. git read-tree --empty to ensure that we can add files we want to keep without keeping the ones we don't
  3. git ls-tree -r HEAD
  4. For each entry returned above except the one you want to remove git update-index -add --cacheinfo <value from ls-tree> <sha1 from ls-tree> <name from ls-tree>
  5. git write-tree saving value
  6. echo 'removing <file>' | git commit-tree <value from previous command> -p <sha1 of current master HEAD> saving value
  7. git update-ref refs/heads/master <value from previous command>

If anyone happens to know of a better way of accomplishing this, I'm all ears. I'll attach a python script (using GitPython) that accomplishes the above shortly.

Edit: Python (w/ GitPython) added

def repo_delete(repo, path: str):
    """Delete the specified file at <path> from the repository."""
    headSha = repo.heads[0].commit.hexsha
    import re
    g = repo.git
    tree = g.ls_tree("-r", "HEAD")
    g.read_tree("--empty")
    for blob in tree.split("\n"):
        blob_parts = re.split("[ \t]", blob)
        if blob_parts[3] != path:
            print(f"adding {blob_parts[3]}")
            g.update_index("--add", "--cacheinfo", blob_parts[0], blob_parts[2], blob_parts[3])
    treeSha = g.write_tree()
    newHeadSha = g.commit_tree(treeSha, "-m", f'"removing {path}"', "-p", headSha)
    g.update_ref("refs/heads/master", newHeadSha)
    print("done")
Sign up to request clarification or add additional context in comments.

1 Comment

That's working way too hard. Just read the tree you want into the index, use git update-index (with --cache-info or --index-info) to set the mode of the file-to-remove to 0, and then use git write-tree to write the new index to a tree object. The rest is similar.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.