Adding entries from multiple files in python

Question

I have a question on how to add entries from 100 files (each file contains two columns) and then writing them to a new file(which will also contain two columns)?

Why do you throw a question here and then do not come back reply to any comment or answer? This is definitely not kind. The first answers appeared a couple of minutes later than your questions. — Thorsten Kranz
– Thorsten Kranz, Commented Feb 6, 2013 at 9:38

unwind · Accepted Answer · 2013-02-05 12:56:14Z

0

This is very underspecified. It's not clear what your problem is.

Probabably you'd do something like:

entries = []
for f in ["file1.txt", "file2.txt", ..., "file100.txt"]:
  entries.append(open(f).readlines())
o = open("output.txt", "w")
o.writelines(entries)
o.close()

answered Feb 5, 2013 at 12:56

unwind

402k64 gold badges492 silver badges620 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

DSM Over a year ago

This requires keeping all the files in memory at once.

Torxed · Accepted Answer · 2013-02-05 13:26:16Z

0

Wasn't sure if you needed a solution to find all those 100 files as well? If so, here is one approach including reading them all and writing them to a joined file:

from os import walk
from os.path import abspath

lines = []
for root, folders, files in walk('./path/'):
    for file in files:
        fh = open(abspath(root + '/' + file), 'rb')
        lines.append(fh.read())
        fh.close()
    # break if you only want the first level of your directory tree

o = open('output.txt', 'wb')
o.write('\n'.join(lines))
o.close()

You could also do a "memory efficient" solution:

from os import walk
from os.path import abspath

o = open('output.txt', 'wb')

for root, folders, files in walk('./path/'):
    for file in files:
        fh = open(abspath(root + '/' + file), 'rb')
        for line in fh.readline():
            o.write(line)
            del line
        fh.close()
        del fh
    # break if you only want the first level of your directory tree

o.close()

Much of this is automated (I think) within Python, but lazy or not, if you can then remove objects from the memory after closing the files and before and before reusing variable names.. just in case?

edited Feb 5, 2013 at 13:26

answered Feb 5, 2013 at 12:59

Torxed

23.6k15 gold badges91 silver badges135 bronze badges

2 Comments

Thorsten Kranz Over a year ago

This will keep all lines from all files in memory - I don't know about the size, but this might become a problem.

Torxed Over a year ago

Fixed, you were write. In my syntax i forgot to close the files. If I'm not mistaken tho Python has a lazy protector that when calling a open() on a file and .read()/.readlines() are called into a object python will automaticly close the file after the content has been droped into a object(variable)? Anyways, fixed.

furins · Accepted Answer · 2013-02-05 15:52:36Z

0

a more scalable way, inspired by Torxed approach

from os import walk
from os.path import abspath

with open('output.txt', 'wb') as o:
    for root, folders, files in walk('./path/'):
        for filename in files:
            with open(abspath(root + '/' + filename), 'rb') as i:
                for line in i:
                    o.write(line)

edited Feb 5, 2013 at 15:52

answered Feb 5, 2013 at 13:07

furins

5,0581 gold badge42 silver badges58 bronze badges

2 Comments

Torxed Over a year ago

Almost the same as my memory "efficient" aproach, tho i open and close the files manually and make sure that line is not left in memory. Python should automated much of those, but if you get lazy and Python misses out you'll still end up with a hugh memory issue if theres a flaw somewhere?

furins Over a year ago

Indeed, the two solutions are almost identical, sorry for that: I think we wrote them at the same time. Python will close the file instance when it will garbage collect the "filename" (I renamed the variable), thus when "filename" will be out of its scope. It may become a unsafe approach on large scripts, whereas closing files explicitly is a good practice as using the with ... construct. I improved my answer basing on this discussion

Thorsten Kranz · Accepted Answer · 2013-02-07 14:54:39Z

0

Do you want to chain them? I.e., do you want all lines of file 1, then all lines of file 2, ... Or do you want to merge them? Line 1 of file 1, line 1 of file 2, ...

For the first case:

from itertools import chain
filenames = ...
file_handles = [open(fn) for fn in filenames]

with open("output.txt", "w") as out_fh:
    for line in chain(file_handles):
        out_fh.write(line)

for fh in file_handles:
    fh.close()

For the second case:

from itertools import izip_longest
filenames = ...
file_handles = [open(fn) for fn in filenames]

with open("output.txt", "w") as out_fh:
    for lines in izip_longest(*file_handles, fillvalue=None):
        for line in lines:
            if line is not None:
                out_fh.write(line)

for fh in file_handles:
    fh.close()

Important: Never forget to close your files!

As @isedev pointed out, this approach is o.k. for 100 files, but as I open all handles immediately, for thousands this won't work.

If you want to overcome this problem, only option 1 (chaining) is reasonable...

filenames = ...

with open("output.txt", "w") as out_fh:
    for fn in filenames:
        with open(fn) as fh:
            for line in fh:
                out_fh.write(line)

edited Feb 7, 2013 at 14:54

answered Feb 5, 2013 at 12:55

Thorsten Kranz

12.8k2 gold badges45 silver badges57 bronze badges

7 Comments

isedev Over a year ago

works for a 100 files but won't scale to thousands (you'd run out of descriptors). Better open/close the input files sequentially.

DSM Over a year ago

If you use the with open(filename) as fp: idiom, you never have to remember to close them.

Thorsten Kranz Over a year ago

I know, but how to do this with a variable number of files?

Thorsten Kranz Over a year ago

@isedev good point, though the OP fixed this value, I didn't mention. Will update my answer.

furins Over a year ago

Hi Thorsten, I miss the point now: why do you need to chain lines? Does for line in fh: provide the same result?

|

Collectives™ on Stack Overflow

Adding entries from multiple files in python

4 Answers 4

1 Comment

2 Comments

2 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

2 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related