Issues appending numpy arrays during for loop

Question

I'm a bit lost at the moment. I correctly initialized an empty numpy array and I believe i'm using the np.append function correctly

Preds = np.empty(shape = (X_test.shape[0],10))

kf = KFold(n = X_train.shape[0], n_folds=10, shuffle = True)

for kf_train, kf_test in kf:

    X_train_kf = X_train.iloc[kf_train]
    Y_train_kf = Y_train.iloc[kf_train]

    dt = tree.DecisionTreeClassifier(max_depth=2)
    dt.fit(X_train_kf, Y_train_kf)
    Preds = np.append(Preds,dt.predict(X_test))

    print Preds

Just some additional info:

X_test has a shape of (9649, 24)
(After running) Preds has a shape of (192980,)

At the of this loop, Preds should have a shape of (9649,10)

Any advice would be much appreciated.

EDIT: Here is the updated solution

Preds = []
kf = KFold(n = X_train.shape[0], n_folds=20, shuffle = True)

for kf_train, kf_test in kf:

    X_train_kf = X_train.iloc[kf_train]
    Y_train_kf = Y_train.iloc[kf_train]

    dt = tree.DecisionTreeClassifier(max_depth=2)
    dt.fit(X_train_kf, Y_train_kf)
    Preds.append(dt.predict(X_test))

Preds = np.vstack(Preds)

from the numpy.append docs: 'axis : int, optional The axis along which values are appended. If axis is not given, both arr and values are flattened before use' — Maarten Fabré
– Maarten Fabré, Commented Nov 9, 2016 at 21:16
It would be best to avoid using append. Instead just add each array to a python list and then use np.vstack or np.hstack to flatten them into a single array at the end. — Erotemic
– Erotemic, Commented Nov 9, 2016 at 21:18
@MaartenFabré it is (9649, ) I tried to change the axis to 1 but I get this error: ValueError: all the input arrays must have same number of dimensions — newbie
– newbie, Commented Nov 9, 2016 at 21:22
@Erotemic, I'm not sure if i'm completely understanding your suggestion but I would like to keep Preds 2 dimensional. At the end of this, I would like to look at the prediction in each column and see what prediction occurs the most often. That will be easier to do if the array is 2D — newbie
– newbie, Commented Nov 9, 2016 at 21:24

hpaulj · Accepted Answer · 2016-11-09 21:36:30Z

If Preds is (9649,10), then you can do one of 2 kinds of concatenation

 newPreds = np.concatenate((Preds, np.zeros((N,10))), axis=0)
 newPreds = np.concatenate((Preds, np.zeros((9649,N)), axis=1)

The first produces a (9649+N, 10) array, the second (9646,10+N).

np.vstack can be use to make the 2nd array is 2d, i.e. it changes (10,) to (1,10) array. np.append takes 2 arguments instead of a list, and makes sure the second is an array. It is better for adding a scalar to a 1d array, than for general purpose concatenation.

Make sure you understand the shapes and number of dimensions of your arrays.

A good alternative is to append to a list

alist = []
alist.append(initial_array)
for ...
    alist.append(next_array)
result = np.concatenate(alist, axis=?)
# vstack, stack, and np.array can be used if dimensions are right

Appending to list, followed by one join at the end is faster than repeated concatenates. Lists are designed to grow cheaply; arrays grow by making a new larger array.

Collectives™ on Stack Overflow

Issues appending numpy arrays during for loop

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related