You could try rewriting take_previous_data as a generator function that lazily yields rows of your final array, then use np.fromiter, as Eli suggested:
from itertools import chain
def take_previous_data(X_train,y):
temp_train_data=X_train[1000:]
temp_labels=y[1000:]
for index,row in enumerate(temp_train_data):
actual_index=index+1000
data=X_train[actual_index-1000:actual_index+1].ravel()
__,cd_i=pywt.dwt(data,'haar')
yield cd_i
gen = take_previous_data(X_train, y)
# I'm assuming that by "int" you meant "int64"
x = np.fromiter(chain.from_iterable(gen), np.int64)
# fromiter gives a 1D output, so we reshape it into a (200001, 3504) array
x.shape = 200001, -1
Another option would be to pre-allocate the output array and fill in the rows as you go along:
def take_previous_data(X_train, y):
temp_train_data=X_train[1000:]
temp_labels=y[1000:]
out = np.empty((200001, 3504), np.int64)
for index,row in enumerate(temp_train_data):
actual_index=index+1000
data=X_train[actual_index-1000:actual_index+1].ravel()
__,cd_i=pywt.dwt(data,'haar')
out[index] = cd_i
return out
From our chat conversation, it seems that the fundamental issue is that you can't actually fit the output array itself in memory. In that case, you could adapt the second solution to use np.memmap to write the output array to disk:
def take_previous_data(X_train, y):
temp_train_data=X_train[1000:]
temp_labels=y[1000:]
out = np.memmap('my_array.mmap', 'w+', shape=(200001, 3504), dtype=np.int64)
for index,row in enumerate(temp_train_data):
actual_index=index+1000
data=X_train[actual_index-1000:actual_index+1].ravel()
__,cd_i=pywt.dwt(data,'haar')
out[index] = cd_i
return out
One other obvious solution would be to reduce the bit depth of your array. I've assumed that by int you meant int64 (the default integer type in numpy). f you could switch to a lower bit depth (e.g. int32, int16 or maybe even int8), you could drastically reduce your memory requirements.
numpy.arrayin the first place?numpy.arraysto a list, which efficient than appending to anumpy array.xto numpy array you are duplicating the memory, which is probably why it crashes. There are many ways (much more efficient than using list) to initialize your data as numpy arrays. Where are you reading your appended numpy arrays from? I mean, the problem is not that numpy crashes, the problem is that your reading data logic is what needs to be improved.listtakasO(1)amortised, but you don't have to append in the first place. You can make a lazy generator and give it tonumpy.fromiterwhile specifying data type and shape. This way you'll get your array without any intermediate data structures.