How to understand/use the Python difflib output?

Question

I am trying to make comprehensive diff that compares command line output of two programs. I used difflib and came up with this code:

from difflib import Differ
from pprint import pprint
import sys

def readable_whitespace(line):
    return line.replace("\n", "\\n")

# Two strings are expected as input
def print_diff(text1, text2):
    d = Differ()
    text1 = text1.splitlines(True)
    text2 = text2.splitlines(True)

    text1 = [readable_whitespace(line) for line in text1]
    text1 = [readable_whitespace(line) for line in text2]

    result = list(d.compare(text1, text2))
    sys.stdout.writelines(result)
    sys.stdout.write("\n")

Some requirements I have:

(obvious) It should be clear what is from which output when there is a difference
New lines are replaced with \n because they matter in my case and must be clearly visible when causing conflict

I made a simple test for my diff function:

A = "AAABAAA\n"
A += "BBB\n"
B = "AAAAAAA\n"
B += "\n"
B += "BBB"
print_diff(A,B)

For your convenience, here is test merged with the function so that you can execute it as file: http://pastebin.com/BvQw9naa

I have no idea what is this output trying to say to me:

- AAAAAAA\n?        ^^
+ AAAAAAA
?        ^
- \n+
  BBB

Notice those two ^ symbols on first line? What are they pointing to...? Also, I intentionally put trailing new line into one test string. I don't think the diff noticed that.

How to make the output comprehensive or learn to understand it?

ekhumoro · Accepted Answer · 2016-10-10 17:14:38Z

2

The main problem with your example is how you are handling endline characters. If you completely replace them in the input, the output will no longer line up correctly, and so won't make any sense. To fix that, the readable_whitespace function should look something like this:

def readable_whitespace(line):
    end = len(line.rstrip('\r\n'))
    return line[:end] + repr(line[end:])[1:-1] + '\n'

This will handle all types of endline sequence, and ensures that the lines are displayed correctly when printed.

The other minor problem is due to a typo:

text1 = [readable_whitespace(line) for line in text1]
text1 = [readable_whitespace(line) for line in text2]
# --^ oops!

Once these fixes are made, the output will look like this:

- AAABAAA\n
?    ^
+ AAAAAAA\n
?    ^
+ \n
- BBB\n
?    --
+ BBB

which should hopefully now make sense to you.

answered Oct 10, 2016 at 17:14

ekhumoro

122k23 gold badges272 silver badges400 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tomáš Zato Over a year ago

wait... Are you sure it's safe to assume that all newlines can be replaced and then put at the end of lines?

ekhumoro Over a year ago

@TomášZato. That's not what's happening. The actual endline characters in the input are being escaped so they can be compared properly and made visible in the output. A newline is then added purely for display purposes.

Collectives™ on Stack Overflow

How to understand/use the Python difflib output?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related