0

I'm a bit lost at the moment. I correctly initialized an empty numpy array and I believe i'm using the np.append function correctly

Preds = np.empty(shape = (X_test.shape[0],10))

kf = KFold(n = X_train.shape[0], n_folds=10, shuffle = True)

for kf_train, kf_test in kf:

    X_train_kf = X_train.iloc[kf_train]
    Y_train_kf = Y_train.iloc[kf_train]

    dt = tree.DecisionTreeClassifier(max_depth=2)
    dt.fit(X_train_kf, Y_train_kf)
    Preds = np.append(Preds,dt.predict(X_test))

    print Preds

Just some additional info:

  • X_test has a shape of (9649, 24)

  • (After running) Preds has a shape of (192980,)

At the of this loop, Preds should have a shape of (9649,10)

Any advice would be much appreciated.

EDIT: Here is the updated solution

Preds = []
kf = KFold(n = X_train.shape[0], n_folds=20, shuffle = True)

for kf_train, kf_test in kf:

    X_train_kf = X_train.iloc[kf_train]
    Y_train_kf = Y_train.iloc[kf_train]

    dt = tree.DecisionTreeClassifier(max_depth=2)
    dt.fit(X_train_kf, Y_train_kf)
    Preds.append(dt.predict(X_test))

Preds = np.vstack(Preds)
7
  • from the numpy.append docs: 'axis : int, optional The axis along which values are appended. If axis is not given, both arr and values are flattened before use' Commented Nov 9, 2016 at 21:16
  • and did you check the shape of dt.predict(X_test)? Commented Nov 9, 2016 at 21:17
  • It would be best to avoid using append. Instead just add each array to a python list and then use np.vstack or np.hstack to flatten them into a single array at the end. Commented Nov 9, 2016 at 21:18
  • @MaartenFabré it is (9649, ) I tried to change the axis to 1 but I get this error: ValueError: all the input arrays must have same number of dimensions Commented Nov 9, 2016 at 21:22
  • @Erotemic, I'm not sure if i'm completely understanding your suggestion but I would like to keep Preds 2 dimensional. At the end of this, I would like to look at the prediction in each column and see what prediction occurs the most often. That will be easier to do if the array is 2D Commented Nov 9, 2016 at 21:24

1 Answer 1

1

If Preds is (9649,10), then you can do one of 2 kinds of concatenation

 newPreds = np.concatenate((Preds, np.zeros((N,10))), axis=0)
 newPreds = np.concatenate((Preds, np.zeros((9649,N)), axis=1)

The first produces a (9649+N, 10) array, the second (9646,10+N).

np.vstack can be use to make the 2nd array is 2d, i.e. it changes (10,) to (1,10) array. np.append takes 2 arguments instead of a list, and makes sure the second is an array. It is better for adding a scalar to a 1d array, than for general purpose concatenation.

Make sure you understand the shapes and number of dimensions of your arrays.

A good alternative is to append to a list

alist = []
alist.append(initial_array)
for ...
    alist.append(next_array)
result = np.concatenate(alist, axis=?)
# vstack, stack, and np.array can be used if dimensions are right

Appending to list, followed by one join at the end is faster than repeated concatenates. Lists are designed to grow cheaply; arrays grow by making a new larger array.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.