3

I would like to switch to either the patience or histogram algorithm in Git, but I'm wondering if there are any side effects for a given repo not being consistent in its use of algorithm. If I switch, will that cause anything to break when I deal with commits that were added prior to the algorithm switch? Will it be a problem if other developers don't use the same algorithm?

I can't think of a specific scenario where there would be a conflict, but it seems like a pretty fundamental change, so I'd to look before I leap nonetheless.

5
  • 1
    IIRC the diffing algorithm doesn't affect how git stores the files or even any of the automatic conflict resolution. It's basically just how git visualizes a difference to the user. Commented Aug 15, 2020 at 20:07
  • 2
    There are several things in the diff-config man docs (git-scm.com/docs/diff-config) that say "Note that this affects only 'git diff' Porcelain", but diff.algorithm does not have that note. So I'm fairly confident that this changes more than just the visualization to the user. But of course I'm prepared to be corrected if I'm wrong 😅. Commented Aug 15, 2020 at 20:15
  • @iconoclast I'd suggest asking in the Blender mailing list if that an omission or not and otherwise clarify it. Commented Aug 15, 2020 at 20:20
  • 1
    Why the Blender mailing list? just because there are super-smart Git users writing Blender?? Wouldn't the Git mailing list be a much better place? Commented Aug 15, 2020 at 20:35
  • The merge strategies are responsible for invoking the internal diff code in the first place, and the only ones that actually do so (as built into Git) invoke it without letting you change the algorithm. If you write your own merge strategy, you can make it do whatever you like, but writing a merge strategy is a major undertaking. Commented Aug 16, 2020 at 3:18

3 Answers 3

4

The diff algorithm you use is in effect from when you set the setting, so it will affect whatever operations are in use at the time. Changing the diff algorithm doesn't have any negative effects explicitly: any diff algorithm will produce an equivalent diff, but the question is how easy it is for folks to read. Patience and histogram are usually better, but not always.

The only time you might have a problem is if you're storing diffs in some system or repository (such as files generated by git format-patch), which isn't very common but is used in some Linux distribution packaging workflows. In such a case, if different people use different diff algorithms, you'll see a lot of diff noise as the patches are regenerated between users, even though the diffs are logically equivalent.

If you have such a case, it's better to just force some fixed diff algorithm with your tooling, which is what I've done in the past. That would look like having your tool run git -c diff.algorithm=myers format-patch.

Beyond that case, there's really no harm in changing the diff algorithm if you find you like something other than the default better.

Sign up to request clarification or add additional context in comments.

Comments

2

No,

it will not break anything. The diffs are always calculated after the fact. You can either change the diff algorithm permanently via config or temporarily via option flags on the command line.

Git does not store diffs, all history is stored as (full) snapshots of tree objects. A tree always points to full files ("blobs" in Git terminology) or subdirectories (represented by other tree objects).

8 Comments

"...all history is stored as (full) snapshots..." No, it isn't. It's true that the stored representation doesn't depend on the diff algorithm used to present text patches, but it absolutely does use deltas.
@MarkAdelsberger: deltas appear only in packfiles, which exist below the object level. Philosophically this is similar to wondering if a file is compressed, when it's stored on ZFS with ZFS-level compression turned on. In one sense, it is compressed, because ZFS compressed each block. But when you open and read the file, you can't tell that it's compressed, especially if it was just moved to a different dataset in which compression is not enabled.
@MarkAdelsberger Git object model stores full snapshots only. Each commit references one single root tree and this root tree then references all files and all subdirectories in full. Pack files use clever compression algorithms to be more space efficient, but this delta compression does neither use (human-readable) "diff"s, nor is it affected by the configured diff algorithm. […]
[…] This happens on a different level, comparable to the different layers of the OSI model. An HTTP request or response is a single entity, but on lower levels it might be fragmented. Not something you have to think or care about when talking HTTP, because the underlying layers will handle this transparently.
@knittl diff != delta compression but delta compression is a type of diff. And neither one of them is a "(full) snapshot". You think it's beside the point becuase you only care if your statement is "correct enough" to explain the behavior asked about; and that is where I differ, because I care that these "correct enough if you squnt at them hard enough" statements lead people to believe that git is too complicated to understand when they try to reason using them and get incorrect results.
|
1

Looking at the evolution of both histogram and patience diffs, there is no side-effect for past commits.

There are effects only for the git diff command itself (or diff-based operation like log -p).
For instance, a git diff --histogram done before Git 2.1 would trigger too many memory allocation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.