2

I'm trying to convert this python script (diff.py)

http://www.aaronsw.com/2002/diff/

into the exact same thing at my site, ie, a web interface. He supplies the script which you can download and I get it to work on my windows computer via command line, but I want it to work on my server also. I am so close. Here is what I have so far.

Here is my html document -

<form action="/cgi-bin/diff.py" method="get"><p>
<strong>Old URL:</strong> <input name="old" type="text"><br>
<strong>New URL:</strong> <input name="new" type="text"><br>
<input value="Diff!" type="submit">
</p></form>

Here is my edited diff.py script that is nearly working -

#!G:\Program Files\Python25\python.exe
"""HTML Diff: http://www.aaronsw.com/2002/diff
Rough code, badly documented. Send me comments and patches.

__author__ = 'Aaron Swartz <[email protected]>'
__copyright__ = '(C) 2003 Aaron Swartz. GNU GPL 2 or 3.'
__version__ = '0.22' """

import cgi
import cgitb; cgitb.enable()
form = cgi.FieldStorage()
reshtml = """Content-Type: text/html\n
<html>
<head><title>Test</title></head>
<body>
"""
print reshtml
a = form['old'].value
b = form['new'].value

import difflib, string

def isTag(x): return x[0] == "<" and x[-1] == ">"

def textDiff(a, b):
    """Takes in strings a and b and returns a human-readable HTML diff."""

    out = []
    a, b = html2list(a), html2list(b)
    s = difflib.SequenceMatcher(None, a, b)
    for e in s.get_opcodes():
        if e[0] == "replace":
            # @@ need to do something more complicated here
            # call textDiff but not for html, but for some html... ugh
            # gonna cop-out for now
            out.append('<del class="diff modified">'+''.join(a[e[1]:e[2]]) +   '</del><ins class="diff modified">'+''.join(b[e[3]:e[4]])+"</ins>")
        elif e[0] == "delete":
            out.append('<del class="diff">'+ ''.join(a[e[1]:e[2]]) + "</del>")
        elif e[0] == "insert":
            out.append('<ins class="diff">'+''.join(b[e[3]:e[4]]) + "</ins>")
        elif e[0] == "equal":
            out.append(''.join(b[e[3]:e[4]]))
        else: 
            raise "Um, something's broken. I didn't expect a '" + `e[0]` + "'."
    return ''.join(out)

def html2list(x, b=0):
    mode = 'char'
    cur = ''
    out = []
    for c in x:
        if mode == 'tag':
            if c == '>': 
                if b: cur += ']'
                else: cur += c
                out.append(cur); cur = ''; mode = 'char'
            else: cur += c
        elif mode == 'char':
            if c == '<': 
                out.append(cur)
                if b: cur = '['
                else: cur = c
                mode = 'tag'
            elif c in string.whitespace: out.append(cur+c); cur = ''
            else: cur += c
    out.append(cur)
    return filter(lambda x: x is not '', out)

if __name__ == '__main__':
    import sys
    try:
        a, b = sys.argv[1:3]
    except ValueError:
        print "htmldiff: highlight the differences between two html files"
        print "usage: " + sys.argv[0] + " a b"
        sys.exit(1)
    print textDiff(open(a).read(), open(b).read())

print '</body>'
print '</html>'

This is the result I get in my browser -

htmldiff: highlight the differences between two html files usage: E:/xampp/cgi-bin/diff.py a b 

Can anyone see what's wrong?

Ok, here is the error when I use print open(a).read() ---

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.
 E:\xampp\cgi-bin\diff2.py in ()
   19 b = form['new'].value
   20 
   21 print open(a).read()
   22 
   23 
builtin open = <built-in function open>, a = 'http://www.google.com', ).read undefined

<type 'exceptions.IOError'>: [Errno 2] No such file or directory: 'http://www.google.com'
    args = (2, 'No such file or directory')
    errno = 2
    filename = 'http://www.google.com'
    message = ''
    strerror = 'No such file or directory'

Ok, I think I actually figured this out on my own. Here are the changes necessary. I stopped at the start of the original code -

#!G:\Program Files\Python25\python.exe
"""HTML Diff: http://www.aaronsw.com/2002/diff
Rough code, badly documented. Send me comments and patches.

__author__ = 'Aaron Swartz <[email protected]>'
__copyright__ = '(C) 2003 Aaron Swartz. GNU GPL 2 or 3.'
__version__ = '0.22' """


import cgi
import cgitb; cgitb.enable()
form = cgi.FieldStorage()
reshtml = """Content-Type: text/html\n
<html>
<head><title>Tonys Test</title></head>
<body>
"""
print reshtml
old2 = form['old'].value
new2 = form['new'].value

import urllib2

a = urllib2.urlopen(old2).read()
b = urllib2.urlopen(new2).read()

#print a
#print b

import difflib, string

Well, I spoke too soon. It works but without the highlighting of differences. I only get strikethrough for the old version. I tried adding in that part I cut out which supposedly does the highlighting but it doesn't work. I get my original error statement. I'll keep working at it.

OK, finally working. I had to add this code at the end -

def htmlDiff(a, b):
    f1, f2 = a.find('</head>'), a.find('</body>')
    ca = a[f1+len('</head>'):f2]

    f1, f2 = b.find('</head>'), b.find('</body>')
    cb = b[f1+len('</head>'):f2]

    r = textDiff(ca, cb)
    hdr = '<style type="text/css"><!-- ins{background-color: #bbffbb} del{background-color: #ffcccc}--></style></head>'
    return b[:f1] + hdr + r + b[f2:]


print htmlDiff(a, b)
print '</body>'
print '</html>'

I found this code in the 0.1 version download.

1
  • Your indentation has been messed up by your paste. Might want to check it. Commented Nov 24, 2012 at 20:26

1 Answer 1

2

This chunk is the problem:

if __name__ == '__main__':
    import sys
    try:
        a, b = sys.argv[1:3]
    except ValueError:
        print "htmldiff: highlight the differences between two html files"
        print "usage: " + sys.argv[0] + " a b"
        sys.exit(1)

Remove it.

And this line:

print textDiff(open(a).read(), open(b).read())

Should become

print textDiff(a, b)
Sign up to request clarification or add additional context in comments.

5 Comments

@tyree: Print out the values of a and band see what you get. And please, fix your indentation in the question
Eric, I had trouble with inserting this after reading the instructions, I'll read them again. I used 4 spaces for most lines.
I printed a and b and got the 2 urls that I entered in the form fields, so that seems to work.
I get an error of course. Looks like it's not looking for a URL, only a directory or file ---
I'm having major problems posting code here and with the 5 minute time limit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.