Can switching diff algorithm in Git cause any problems?

Question

I would like to switch to either the patience or histogram algorithm in Git, but I'm wondering if there are any side effects for a given repo not being consistent in its use of algorithm. If I switch, will that cause anything to break when I deal with commits that were added prior to the algorithm switch? Will it be a problem if other developers don't use the same algorithm?

I can't think of a specific scenario where there would be a conflict, but it seems like a pretty fundamental change, so I'd to look before I leap nonetheless.

IIRC the diffing algorithm doesn't affect how git stores the files or even any of the automatic conflict resolution. It's basically just how git visualizes a difference to the user. — Joachim Sauer
– Joachim Sauer, Commented Aug 15, 2020 at 20:07
There are several things in the diff-config man docs (git-scm.com/docs/diff-config) that say "Note that this affects only 'git diff' Porcelain", but diff.algorithm does not have that note. So I'm fairly confident that this changes more than just the visualization to the user. But of course I'm prepared to be corrected if I'm wrong 😅. — iconoclast
– iconoclast, Commented Aug 15, 2020 at 20:15
@iconoclast I'd suggest asking in the Blender mailing list if that an omission or not and otherwise clarify it. — Acorn
– Acorn, Commented Aug 15, 2020 at 20:20
Why the Blender mailing list? just because there are super-smart Git users writing Blender?? Wouldn't the Git mailing list be a much better place? — iconoclast
– iconoclast, Commented Aug 15, 2020 at 20:35
The merge strategies are responsible for invoking the internal diff code in the first place, and the only ones that actually do so (as built into Git) invoke it without letting you change the algorithm. If you write your own merge strategy, you can make it do whatever you like, but writing a merge strategy is a major undertaking. — torek
– torek, Commented Aug 16, 2020 at 3:18

bk2204 · Accepted Answer · 2020-08-16 14:52:23Z

The diff algorithm you use is in effect from when you set the setting, so it will affect whatever operations are in use at the time. Changing the diff algorithm doesn't have any negative effects explicitly: any diff algorithm will produce an equivalent diff, but the question is how easy it is for folks to read. Patience and histogram are usually better, but not always.

The only time you might have a problem is if you're storing diffs in some system or repository (such as files generated by git format-patch), which isn't very common but is used in some Linux distribution packaging workflows. In such a case, if different people use different diff algorithms, you'll see a lot of diff noise as the patches are regenerated between users, even though the diffs are logically equivalent.

If you have such a case, it's better to just force some fixed diff algorithm with your tooling, which is what I've done in the past. That would look like having your tool run git -c diff.algorithm=myers format-patch.

Beyond that case, there's really no harm in changing the diff algorithm if you find you like something other than the default better.

knittl · Accepted Answer · 2020-08-15 21:37:59Z

2

No,

it will not break anything. The diffs are always calculated after the fact. You can either change the diff algorithm permanently via config or temporarily via option flags on the command line.

Git does not store diffs, all history is stored as (full) snapshots of tree objects. A tree always points to full files ("blobs" in Git terminology) or subdirectories (represented by other tree objects).

answered Aug 15, 2020 at 21:37

knittl

269k59 gold badges339 silver badges405 bronze badges

8 Comments

Mark Adelsberger Over a year ago

"...all history is stored as (full) snapshots..." No, it isn't. It's true that the stored representation doesn't depend on the diff algorithm used to present text patches, but it absolutely does use deltas.

torek Over a year ago

@MarkAdelsberger: deltas appear only in packfiles, which exist below the object level. Philosophically this is similar to wondering if a file is compressed, when it's stored on ZFS with ZFS-level compression turned on. In one sense, it is compressed, because ZFS compressed each block. But when you open and read the file, you can't tell that it's compressed, especially if it was just moved to a different dataset in which compression is not enabled.

knittl Over a year ago

@MarkAdelsberger Git object model stores full snapshots only. Each commit references one single root tree and this root tree then references all files and all subdirectories in full. Pack files use clever compression algorithms to be more space efficient, but this delta compression does neither use (human-readable) "diff"s, nor is it affected by the configured diff algorithm. […]

knittl Over a year ago

[…] This happens on a different level, comparable to the different layers of the OSI model. An HTTP request or response is a single entity, but on lower levels it might be fragmented. Not something you have to think or care about when talking HTTP, because the underlying layers will handle this transparently.

Mark Adelsberger Over a year ago

@knittl diff != delta compression but delta compression is a type of diff. And neither one of them is a "(full) snapshot". You think it's beside the point becuase you only care if your statement is "correct enough" to explain the behavior asked about; and that is where I differ, because I care that these "correct enough if you squnt at them hard enough" statements lead people to believe that git is too complicated to understand when they try to reason using them and get incorrect results.

|

VonC · Accepted Answer · 2020-08-15 21:40:47Z

1

Looking at the evolution of both histogram and patience diffs, there is no side-effect for past commits.

There are effects only for the git diff command itself (or diff-based operation like log -p).
For instance, a git diff --histogram done before Git 2.1 would trigger too many memory allocation.

answered Aug 15, 2020 at 21:40

VonC

1.4m569 gold badges4.8k silver badges5.7k bronze badges

Collectives™ on Stack Overflow

Can switching diff algorithm in Git cause any problems?

3 Answers 3

Comments

No,

8 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

No,

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related