1

I have 2 text files each having same number of lines, i want to merge these 2 text files into a single csv file into 2 fields with an additional field of line number.is this possible in python ?

File1:
This is a source first line 
This is a source second line
This is a source third line 

File2:
This is a transformed line 1
This is a transformed line 2
This is a transformed line 3 

Outputfile:
1,This is a source first line    ,This is a transformed line 1
2,This is a source second line   ,This is a transformed line 2
3,This is a source third  line   ,This is a transformed line 3
2

3 Answers 3

1

Given:

$ cat file1
This is a source first line 
This is a source second line
This is a source third line 
$ cat file2
This is a transformed line 1
This is a transformed line 2
This is a transformed line 3 

You can do:

from itertools import izip_longest

with open(fn1) as f1, open(fn2) as f2:
    print '\n'.join(['{}: {}\t{}'.format(i,l1.strip(),l2.strip()) for i,(l1,l2) in enumerate(izip_longest(f1,f2),1)])

Prints:

1: This is a source first line  This is a transformed line 1
2: This is a source second line This is a transformed line 2
3: This is a source third line  This is a transformed line 3

Now suppose you have:

$ cat file1
This is a source first line 
This is a source second line
This is a source third line 
$ cat file2
This is a transformed line 1
This is a transformed line 2
This is a transformed line 3 
This is line 4

You need to make the output true columns (by using {:40} to set a 40 character column value) and use a fillvalue for izip_longest:

with open(fn1) as f1, open(fn2) as f2:
    print '\n'.join(['{}: {:40}{:40}'.format(i,l1.strip(),l2.strip()) for i,(l1,l2) in enumerate(izip_longest(f1,f2,fillvalue=""),1)])

Prints:

1: This is a source first line             This is a transformed line 1            
2: This is a source second line            This is a transformed line 2            
3: This is a source third line             This is a transformed line 3            
4:                                         This is line 4     
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @dawg. I have used the zip_longest and was able to get the needed output file. Thanks everyone for all the valuable inputs.
0

We can do something like this without importing. If we have two files:

File1:
This is a source first line 
This is a source second line
This is a source third line

File2:
This is a transformed line 1
This is a transformed line 2
This is a transformed line 3

Then...

with open("file1.txt") as f, open("file2.txt") as f2, open("outFile.txt", "w+") as o:
        lines = len(f.readlines())
        f.seek(0)
        for i in range(lines):
                o.write("{},{} \t\t,{}\n".format(i+1, f.readline().strip(), f2.readline().strip()))

To explain: We open the two reading files and the one writing file. We see how many rows are in the file. We put the line-reading cursor back at the top of the file. Then, for each line, we write it to the file by including the index, the first file's line, the tabs and commas, and the second file's line. Our output:

1,This is a source first line           ,This is a transformed line 1
2,This is a source second line          ,This is a transformed line 2
3,This is a source third line           ,This is a transformed line 3

1 Comment

1) No need to read the file then rewind to get the line count -- just use enumerate 2) This will throw an error if one file is a different length than the other.
0
with open(r'C:/file1.txt') as f1, open(r'C:/file2.txt') as f2, open(r'C:/destination.txt', 'w') as o:
    for index, (line1, line2) in enumerate(zip(f1, f2), 1):
            o.write('{}:,{} ,{}\n'.format(index, line1.rstrip(), line2.rstrip()))

The nice thing about this solution is that it doesn't read in the entire files into memory, it iterates over each line in the input files and writes them to the output file one at a time. I made an assumption based on the original question that both files have the same number of lines, but if they don't then you would use zip_longest instead of zip here.

4 Comments

That will truncate the longer of the two files.
The original post says "I have 2 text files each having same number of lines"
How silly to design a solution based on that assumption when there is an easy and Pythonic solution to avoid it.
I've updated with a comment to indicate that zip_longest is an option. I do agree it's more flexible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.