1

I have 2 text files. I want to compare the 2 text files and return a list that has every line number that is different. Right now, I think my code returns the lines that are different, but how do I return the line number instead?

def diff(filename1, filename2):
    with open('./exercise-files/text_a.txt', 'r') as filename1:
        with open('./exercise-files/text_b.txt', 'r') as filename2:
            difference = set(filename1).difference(filename2)

    difference.discard('\n')

    with open('diff.txt', 'w') as file_out:
        for line in difference:
            file_out.write(line)

Testing on:

diff('./exercise-files/text_a.txt', './exercise-files/text_b.txt') == [3, 4, 6]
diff('./exercise-files/text_a.txt', './exercise-files/text_a.txt') == []
3
  • You may want to define a wrapper class W, containing line and lineno, with custom def __eq__ and def __hash__ respecting line. Then build a list of instances of W, and compute difference = set(ws1).difference(ws2). Commented Sep 28, 2020 at 17:57
  • What should be the expected output if filename contains duplicate lines? Commented Sep 28, 2020 at 17:58
  • Check out the built-in enumerate function, which gives both an index and the item of the iterable. Commented Sep 28, 2020 at 18:06

2 Answers 2

1
difference = [
    line_number + 1 for line_number, (line1, line2)
    in enumerate(zip(filename1, filename2))
    if line1 != line2
]

zip takes two (or more) generators and returns a generator of tuples, where each tuple contains the corresponding entries of each generator. enumerate takes this generator and returns a generator of tuples, where the first element is the index and the second the value from the original generator. And it's straightforward from there.

Sign up to request clarification or add additional context in comments.

3 Comments

Your answer is clearly the best approach, but could I suggest starting the numbering from 1, because that accords with what is usually meant by line numbers.
enumerate takes optional starting argument.... to clean up (albeit marginally): enumerate(zip(filename1, filename2), 1) alleviates the +1
@AirSquid Today I learned, thanks. I still prefer my method for clarity though.
0

Here is an example which will ignore any surplus lines if one file has more lines than the other. The key is to use enumerate when iterating to get the line number as well as the contents. next can be used to get a line from the file iterator which is not used directly by the for loop.

def diff(filename1, filename2):
    difference_line_numbers = []
    with open(filename1, "r") as file1, open(filename2, "r") as file2:
        for line_number, contents1 in enumerate(file1, 1):
            try:
                contents2 = next(file2)
            except StopIteration:
                break
            if contents1 != contents2:
                difference_line_numbers.append(line_number)
    return difference_line_numbers

1 Comment

I'm in two minds whether to delete this answer or not. I'll leave it because it does actually work and might perhaps be instructive, but having seen Thomas's answer, the use of zip is so much more tidy and straightforward.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.