I am reading a list of CSV files and always appending the data to a new column in my array. My current solution is analogous to the following:
import numpy as np
# Random generator and paths for the sake of reproducibility
fake_read_csv = lambda path: np.random.random(5)
paths = ['a','b','c','d']
first_iteration=True
for path in paths:
print(f'Reading path {path}')
sub = fake_read_csv(path)
if first_iteration:
first_iteration=False
pred = sub
else:
pred = np.c_[pred, sub] # append to a new column
print(pred)
I was wondering if it is possible to simplify the loop. For example, something like this:
import numpy as np
fake_read_csv = lambda path: np.random.random(5)
paths = ['a','b','c','d']
pred = np.array([])
for path in paths:
print(f'Reading path {path}')
sub = fake_read_csv(path)
pred = np.c_[pred, sub] # append to a new column
Which raises the error:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
numpy? Maybe use something like this: stackoverflow.com/a/21232849/10197418 ?pandas, it's convenient for handling csv and tabular data.np.arrayin order to be used as input of a Keras model. As I might have memory constraints (each CSV has 1Gb), I am considering reading each file directly into a numpy array instead of reading everything as a pandas dataframe and then converting to numpy array later.