3

I am trying to make comprehensive diff that compares command line output of two programs. I used difflib and came up with this code:

from difflib import Differ
from pprint import pprint
import sys

def readable_whitespace(line):
    return line.replace("\n", "\\n")

# Two strings are expected as input
def print_diff(text1, text2):
    d = Differ()
    text1 = text1.splitlines(True)
    text2 = text2.splitlines(True)

    text1 = [readable_whitespace(line) for line in text1]
    text1 = [readable_whitespace(line) for line in text2]

    result = list(d.compare(text1, text2))
    sys.stdout.writelines(result)
    sys.stdout.write("\n")

Some requirements I have:

  • (obvious) It should be clear what is from which output when there is a difference
  • New lines are replaced with \n because they matter in my case and must be clearly visible when causing conflict

I made a simple test for my diff function:

A = "AAABAAA\n"
A += "BBB\n"
B = "AAAAAAA\n"
B += "\n"
B += "BBB"
print_diff(A,B)

For your convenience, here is test merged with the function so that you can execute it as file: http://pastebin.com/BvQw9naa

I have no idea what is this output trying to say to me:

- AAAAAAA\n?        ^^
+ AAAAAAA
?        ^
- \n+
  BBB

Notice those two ^ symbols on first line? What are they pointing to...? Also, I intentionally put trailing new line into one test string. I don't think the diff noticed that.

How to make the output comprehensive or learn to understand it?

1 Answer 1

2

The main problem with your example is how you are handling endline characters. If you completely replace them in the input, the output will no longer line up correctly, and so won't make any sense. To fix that, the readable_whitespace function should look something like this:

def readable_whitespace(line):
    end = len(line.rstrip('\r\n'))
    return line[:end] + repr(line[end:])[1:-1] + '\n'

This will handle all types of endline sequence, and ensures that the lines are displayed correctly when printed.

The other minor problem is due to a typo:

text1 = [readable_whitespace(line) for line in text1]
text1 = [readable_whitespace(line) for line in text2]
# --^ oops!    

Once these fixes are made, the output will look like this:

- AAABAAA\n
?    ^
+ AAAAAAA\n
?    ^
+ \n
- BBB\n
?    --
+ BBB

which should hopefully now make sense to you.

Sign up to request clarification or add additional context in comments.

2 Comments

wait... Are you sure it's safe to assume that all newlines can be replaced and then put at the end of lines?
@TomášZato. That's not what's happening. The actual endline characters in the input are being escaped so they can be compared properly and made visible in the output. A newline is then added purely for display purposes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.