Append 1D numpy array to file with new element in new line

Question

I have a number of numpy arrays which I generate iteratively. I want to save each array to a file. I then generate the next array and append it to the file and so forth (if I did it in one go I would use too much memory). How do I best do that? Is there a way of making us of numpy functions such as e.g. numpy.savetxt? (I couldn't find an append option for that function.)

My current code is:

with open('paths.dat','w') as output:
    for i in range(len(hist[0])):
        amount = hist[0][i].astype(int)
        array = hist[1][i] * np.ones(amount)
        for value in array:
            output.write(str(value)+'\n')

Wouldn't the Python built-in file reading/writing accomplish this? — lucasnadalutti
– lucasnadalutti, Commented Sep 30, 2016 at 17:28
Yes of course, I wonder if there is a more efficient numpy way of doing it though. Am editing my post to that end now. — P-M
– P-M, Commented Sep 30, 2016 at 17:30
@P-M Why is it so important for you to save them in the same file? — MZHm
– MZHm, Commented Sep 30, 2016 at 17:34
Another toolbox analyses the data in the next step and uses genfromtxt to reproduce the list (alas, it needs to be written to the drive first). I could combine the arrays generated by genfromtxt but they will be very large so the less I do with them the better. — P-M
– P-M, Commented Sep 30, 2016 at 17:39
Okay. So you do not need access to the data during the time you write it - only after it is completely ready? — MZHm
– MZHm, Commented Sep 30, 2016 at 17:51

hpaulj · Accepted Answer · 2016-09-30 20:12:14Z

1

You could pass the open file (handle) to savetxt

with open('paths.dat','w') as output:
    for i in range(len(hist[0])):
        amount = hist[0][i].astype(int)
        myArray = hist[1][i] * np.ones(amount)
        np.savetxt(output, myArray, delimiter=',', fmt='%10f')

np.savetxt opens the file if given a name, otherwise it used the file.

Then iterates on the rows of the array and writes them

for row in myArray:
    f.write(fmt % tuple(row))

where fmt is the string you give, or one that is replicated to match the number of columns in your array.

answered Sep 30, 2016 at 20:12

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MZHm · Accepted Answer · 2016-09-30 18:43:00Z

I would recommend using HDF5. They are very fast for IO. Here is how you write your data:

import numpy as np
import tables

fname = 'myOutput.h5'
length = 100  # your data length
my_data_generator = xrange(length) # Your data comes here instead of the xrange

filters = tables.Filters(complib='blosc', complevel=5)  # you could change these
h5file = tables.open_file(fname, mode='w', title='yourTitle', filters=filters)
group = h5file.create_group(h5file.root, 'MyData', 'MyData')
x_atom = tables.Float32Atom()

x = h5file.create_carray(group, 'X', atom=x_atom, title='myTitle',
                         shape=(length,), filters=filters)

# this is a basic example.  It will be faster if you write it in larger chunks in your real code
# like x[start1:end1] = elements[start2:end2]
for element_i, element in enumerate(my_data_generator):
    x[element_i] = element
    h5file.flush()

h5file.close()

For reading it use:

h5file = tables.open_file(fname, mode='r')
x = h5file.get_node('/MyData/X')
print x[:10]

The result:

marray([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.], dtype=float32)

Collectives™ on Stack Overflow

Append 1D numpy array to file with new element in new line

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related