python loadtxt from many files, appending into the same numpy arrays

Question

I'm new to python and want the most pythonic way of solving the following basic problem:

I have many plain-text data files file.00001, file.00002, ..., file.99999 and each file has a single line, with numeric data stored in e.g. four columns. I want to read each file sequentially and append the data into one array per column, so in the end I want the arrays arr0, arr1, arr2, arr3 each with shape=(99999,) containing all the data from the appropriate column in all the files.

Later on I want to do lots of math with these arrays so I need to make sure that their entries are contiguous in memory. My naive solution is:

import numpy as np
fnumber = 99999
fnums = np.arange(1, fnumber+1)

arr0 = np.full_like(fnums, np.nan, dtype=np.double)
arr1 = np.full_like(fnums, np.nan, dtype=np.double)
arr2 = np.full_like(fnums, np.nan, dtype=np.double)
arr3 = np.full_like(fnums, np.nan, dtype=np.double)
# ...also is there a neat way of doing this??

for fnum in fnums:
    fname = f'path/to/data/folder/file.{fnum:05}'
    arr0[fnum-1], arr1[fnum-1], arr2[fnum-1], arr3[fnum-1] = np.loadtxt(fname, delimiter=' ', unpack=True)

# error checking - in case a file got deleted or something
all_arrs = (arr0, arr1, arr2, arr3)
if np.isnan(all_arrs).any():
    print("CUIDADO HAY NANS!!!!\nLOOK OUT, THERE ARE NANS!!!!")

It strikes me that this is very C-thinking and there probably is a more pythonic way of doing it. But my feeling is that methods like numpy.concatenate and numpy.insert would either not result in arrays with their contents contiguous in memory, or involve deep copies of each array at every step in the for loop, which would probably melt my laptop.

Is there a more pythonic way?

all_arrs is a tuple of arrays. np.isnan will turn that into one array (e.g. np.array(all_arrs), returning a boolean array of shape (4,100000?). — hpaulj
– hpaulj, Commented Nov 4, 2020 at 22:16

hpaulj · Accepted Answer · 2020-11-04 18:25:26Z

2

Try:

alist = []
for fnum in fnums:
    fname = f'path/to/data/folder/file.{fnum:05}'
    alist.append(np.loadtxt(fname))
arr = np.array(alist)
# arr = np.vstack(alist)    # alternative
print(arr.shape)

Assuming the files all have the same number of columns, one of these should work. The result will be one array, which you could separate into 4 if needed.

answered Nov 4, 2020 at 18:25

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jms547 Over a year ago

How does this work? Inside the for loop you append to alist and then outside you recast(??) alist as an ndarray which takes care of the efficient memory allocation? Seems ok, except since numpy arrays are row-major aren't contiguous elements in a column 4 memory addresses apart rather than right next to each other? (Also your method stores the same data twice, once in alist and once in arr but I guess that's not a problem with only a few thousand numbers.

hpaulj Over a year ago

loadtxt loads the file line by line and builds the result from the resulting list of lists. It really doesn't matter whether you collect a list of arrays, and make an array from those, or assign those arrays to slices of a predefined array. Memory use and copying is basically the same. Remember a list contains references/pointers to objects else where in memory.

hpaulj Over a year ago

appending arrays to a list is quite efficient since it just adds a reference to the list. Doing concatenate iteratively is slow, but doing one copy at the end, joining all arrays into one is better. But feel free to compare alternatives (on smaller problems if needed).

jms547 Over a year ago

My feeling is that this solution is definitely more pythonic, but involves a bit more passing the same data around multiple times. And if I assign my arr0 to arr3 by slicing your arr then it will leave them as arrays that aren't contiguous in memory. If I copy them it should be ok.

Collectives™ on Stack Overflow

python loadtxt from many files, appending into the same numpy arrays

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related