13

Using Python, I'd like to output the difference between two strings as a unified diff (-u) while, optionally, ignoring blank lines (-B) and spaces (-w).

Since the strings were generated internally, I'd prefer to not deal with nuanced complexity of writing one or both strings to a file, running GNU diff, fixing up the output, and finally cleaning up.

While difflib.unified_diff generates unified diffs it doesn't seem to let me tweak how spaces and blank lines are handled. I've looked at its implementation and, I suspect, the only solution is to copy/hack that function's body.

Is there anything better?

For the moment I'm stripping the pad characters using something like:

import difflib
import re
import sys

l = "line 1\nline 2\nline 3\n"
r = "\nline 1\n\nline 2\nline3\n"
strip_spaces = True
strip_blank_lines = True

if strip_spaces:
    l = re.sub(r"[ \t]+", r"", l)
    r = re.sub(r"[ \t]+", r"", r)
if strip_blank_lines:
    l = re.sub(r"^\n", r"", re.sub(r"\n+", r"\n", l))
    r = re.sub(r"^\n", r"", re.sub(r"\n+", r"\n", r))
# run diff
diff = difflib.unified_diff(l.splitlines(keepends=True), r.splitlines(keepends=True))
sys.stdout.writelines(list(diff))

which, of course, results in the output for a diff of something something other than the original input. For instance, pass the above text to GNU diff 3.3 run as "diff -u -w" and "line 3" is displayed as part of the context, the above would display "line3".

9
  • "which, of course, results in diffs for something other than the original input." Sure, but that's what diff does, right? OTOH, I'm sure that diff replaces whitespace with a single blank rather than no blanks... Commented Jul 31, 2015 at 20:17
  • GNU diff 3.3 describes -w thus: "The --ignore-all-space' (-w') option is stronger still. It ignores differences even if one line has white space where the other line has none. ..." Commented Aug 1, 2015 at 0:51
  • 1
    @patrick No, diff uses the original input when displaying the context (and that includes things like correct line numbers), not something mangled beyond belief Commented Aug 1, 2015 at 1:02
  • Ah, display. I was thinking about the compare. I suppose you could keep track of line numbers out of diff and display the original, but at that point, you're right -- it probably makes more sense to fix difflib if it doesn't do that. Commented Aug 1, 2015 at 1:47
  • And you're right, I was thinking of -b, not -w Commented Aug 1, 2015 at 1:59

1 Answer 1

1

Make Your own SequenceMatcher, copy unified_diff body and replace SequenceMatcher with Your own matcher.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.