0

I have a number of numpy arrays which I generate iteratively. I want to save each array to a file. I then generate the next array and append it to the file and so forth (if I did it in one go I would use too much memory). How do I best do that? Is there a way of making us of numpy functions such as e.g. numpy.savetxt? (I couldn't find an append option for that function.)

My current code is:

with open('paths.dat','w') as output:
    for i in range(len(hist[0])):
        amount = hist[0][i].astype(int)
        array = hist[1][i] * np.ones(amount)
        for value in array:
            output.write(str(value)+'\n')
7
  • Wouldn't the Python built-in file reading/writing accomplish this? Commented Sep 30, 2016 at 17:28
  • Yes of course, I wonder if there is a more efficient numpy way of doing it though. Am editing my post to that end now. Commented Sep 30, 2016 at 17:30
  • @P-M Why is it so important for you to save them in the same file? Commented Sep 30, 2016 at 17:34
  • Another toolbox analyses the data in the next step and uses genfromtxt to reproduce the list (alas, it needs to be written to the drive first). I could combine the arrays generated by genfromtxt but they will be very large so the less I do with them the better. Commented Sep 30, 2016 at 17:39
  • Okay. So you do not need access to the data during the time you write it - only after it is completely ready? Commented Sep 30, 2016 at 17:51

2 Answers 2

1

You could pass the open file (handle) to savetxt

with open('paths.dat','w') as output:
    for i in range(len(hist[0])):
        amount = hist[0][i].astype(int)
        myArray = hist[1][i] * np.ones(amount)
        np.savetxt(output, myArray, delimiter=',', fmt='%10f')

np.savetxt opens the file if given a name, otherwise it used the file.

Then iterates on the rows of the array and writes them

for row in myArray:
    f.write(fmt % tuple(row))

where fmt is the string you give, or one that is replicated to match the number of columns in your array.

Sign up to request clarification or add additional context in comments.

Comments

1

I would recommend using HDF5. They are very fast for IO. Here is how you write your data:

import numpy as np
import tables

fname = 'myOutput.h5'
length = 100  # your data length
my_data_generator = xrange(length) # Your data comes here instead of the xrange

filters = tables.Filters(complib='blosc', complevel=5)  # you could change these
h5file = tables.open_file(fname, mode='w', title='yourTitle', filters=filters)
group = h5file.create_group(h5file.root, 'MyData', 'MyData')
x_atom = tables.Float32Atom()

x = h5file.create_carray(group, 'X', atom=x_atom, title='myTitle',
                         shape=(length,), filters=filters)

# this is a basic example.  It will be faster if you write it in larger chunks in your real code
# like x[start1:end1] = elements[start2:end2]
for element_i, element in enumerate(my_data_generator):
    x[element_i] = element
    h5file.flush()

h5file.close()

For reading it use:

h5file = tables.open_file(fname, mode='r')
x = h5file.get_node('/MyData/X')
print x[:10]

The result:

marray([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.], dtype=float32)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.