1

I was using Python's difflib to create comprehensive differential logs between rather long files. Everything was running smoothly, until I encountered problem of never-ending diffs. After digging around, it turned out that difflib cannot handle long sequences of semi-matching lines. Here is a (somewhat minimal) example:

import sys
import random
import difflib

def make_file(fname, dlines):
    with open(fname, 'w') as f:
        f.write("This is a small file with a long sequence of different lines\n")
        f.write("Some of the starting lines could differ {}\n".format(random.random()))
        f.write("...\n")
        f.write("...\n")
        f.write("...\n")
        f.write("...\n")
        for i in range(dlines):
            f.write("{}\t{}\t{}\t{}\n".format(i, i+random.random()/100, i+random.random()/10000, i+random.random()/1000000))

make_file("a.txt", 125)
make_file("b.txt", 125)

with open("a.txt") as ff:
    fromlines = ff.readlines()
with open("b.txt") as tf:
    tolines = tf.readlines()

diff = difflib.ndiff(fromlines, tolines)

sys.stdout.writelines(diff)

Even for the 125 lines in the example, it took Python over 4 seconds to compute and print the diff, while for GNU Diff it took literally a few milliseconds. And I'm facing problems, where the number of lines is approx. 100 times larger.

Is there a sensible solution to the issue? I hoped for using difflib, as it produces rather nice HTML diffs, but I am open to suggestions. I need a portable solution, that would work on as many platforms as possible, although I am already considering porting GNU Diff for the matter :). Hacking into difflib is also possible as long as I wouldn't have to literally rewrite the whole library.

PS. The files might have variable-length prefixes, so splitting them into parts without aligning diff context might not be the best idea.

1

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.