Comparing 2 text files in python

Question

I have 2 text files. I want to compare the 2 text files and return a list that has every line number that is different. Right now, I think my code returns the lines that are different, but how do I return the line number instead?

def diff(filename1, filename2):
    with open('./exercise-files/text_a.txt', 'r') as filename1:
        with open('./exercise-files/text_b.txt', 'r') as filename2:
            difference = set(filename1).difference(filename2)

    difference.discard('\n')

    with open('diff.txt', 'w') as file_out:
        for line in difference:
            file_out.write(line)

Testing on:

diff('./exercise-files/text_a.txt', './exercise-files/text_b.txt') == [3, 4, 6]
diff('./exercise-files/text_a.txt', './exercise-files/text_a.txt') == []

You may want to define a wrapper class W, containing line and lineno, with custom def __eq__ and def __hash__ respecting line. Then build a list of instances of W, and compute difference = set(ws1).difference(ws2). — pts
– pts, Commented Sep 28, 2020 at 17:57
What should be the expected output if filename contains duplicate lines? — pts
– pts, Commented Sep 28, 2020 at 17:58
Check out the built-in enumerate function, which gives both an index and the item of the iterable. — n1c9
– n1c9, Commented Sep 28, 2020 at 18:06

Thomas · Accepted Answer · 2020-09-28 18:09:42Z

1

difference = [
    line_number + 1 for line_number, (line1, line2)
    in enumerate(zip(filename1, filename2))
    if line1 != line2
]

zip takes two (or more) generators and returns a generator of tuples, where each tuple contains the corresponding entries of each generator. enumerate takes this generator and returns a generator of tuples, where the first element is the index and the second the value from the original generator. And it's straightforward from there.

edited Sep 28, 2020 at 18:09

answered Sep 28, 2020 at 17:54

Thomas

183k57 gold badges383 silver badges510 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

alani Over a year ago

Your answer is clearly the best approach, but could I suggest starting the numbering from 1, because that accords with what is usually meant by line numbers.

AirSquid Over a year ago

enumerate takes optional starting argument.... to clean up (albeit marginally): enumerate(zip(filename1, filename2), 1) alleviates the +1

Thomas Over a year ago

@AirSquid Today I learned, thanks. I still prefer my method for clarity though.

alani · Accepted Answer · 2020-09-28 17:56:06Z

0

Here is an example which will ignore any surplus lines if one file has more lines than the other. The key is to use enumerate when iterating to get the line number as well as the contents. next can be used to get a line from the file iterator which is not used directly by the for loop.

def diff(filename1, filename2):
    difference_line_numbers = []
    with open(filename1, "r") as file1, open(filename2, "r") as file2:
        for line_number, contents1 in enumerate(file1, 1):
            try:
                contents2 = next(file2)
            except StopIteration:
                break
            if contents1 != contents2:
                difference_line_numbers.append(line_number)
    return difference_line_numbers

answered Sep 28, 2020 at 17:56

alani

13.2k3 gold badges18 silver badges34 bronze badges

1 Comment

alani Over a year ago

I'm in two minds whether to delete this answer or not. I'll leave it because it does actually work and might perhaps be instructive, but having seen Thomas's answer, the use of zip is so much more tidy and straightforward.

Collectives™ on Stack Overflow

Comparing 2 text files in python

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related