2

I'm aiming to write a script that will compare each line within a file, and based upon this comparison, create a new file containing the lines of text which aren't in the second file.

For example;

**File 1:** 

Bob:20 
Dan:50 
Brad:34 
Emma:32 
Anne:43

**File 2:**

Dan:50
Emma:32
Anne:43

The new output (File 3):

Bob:20
Brad:34

I have some idea of how this needs to be done, but not exactly:

def compare(File1,File2):
   with open(File1, "a") as f1:
       lines = f1.readlines()
       string = line.split(':')
   with open(File2, "a") as f2:
       lines = f2.readlines()
       string2 = line.split(':')
       if string[0] == string[1]:
           with open("newfile2.txt", "w") as f3:
            ....

I think I need something along the lines of this and then to compare the string[0] from each line of each file but I'm really clueless from this point.

Any help would be extremely welcomed.

4
  • 2
    Do you know of the unix tool diff? You might be trying to reinvent the wheel here… Commented Jan 29, 2015 at 11:33
  • similar to stackoverflow.com/questions/3544331/… Commented Jan 29, 2015 at 11:33
  • 1
    Are the files already ordered? Is the number behind the colon irrelevant? Commented Jan 29, 2015 at 11:38
  • The files will have up to 10,000 lines of text and the number next to the names will always be random. Commented Jan 29, 2015 at 11:44

2 Answers 2

8

This is working for me:

def compare(File1,File2):
    with open(File1,'r') as f:
        d=set(f.readlines())


    with open(File2,'r') as f:
        e=set(f.readlines())

    open('file3.txt','w').close() #Create the file

    with open('file3.txt','a') as f:
        for line in list(d-e):
           f.write(line)

You need to compare the readlines set and find out lines that are not present in file2. You can then append these lines to the new file.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks - could you amend this as so it compares the files only based on the name?
I didn't understand.. Elaborate please?
Is this what you wanted?
I'd like it so the lines are compared in the file based only on the name, not the number. For example, file 1 could have Bob:5, and file two have Bob:3 - but since the names are still the same, it doesn't output this into file3.
I think something like line.split(':')[0] would be needed? but I'm unsure on where this would go.
0

If there is difference in line, the program will print it.

with open("H:/Ast/Hpa.java", encoding="utf8") as f:
    with open("G:/Soft_install/Hpa.java", encoding="utf8") as fe:
        for line in f:
            for linefe in fe:
                if (line != linefe):
                    print(line)
                    break
                else:
                    break

2 Comments

Hi! Please add some explanation as to why this is a solution to OP's question. This will help OP and future visitors to the site who come across this question. Code only answers are generally discouraged on SO. Thanks! --From Review.
This does not work when an entire line is missing from a file. For example: if the new file has an added line. It only prints a changed line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.