2

I'm trying to create a decision tree classifier function that will build an ensemble of decision trees and make the final prediction based on the majority vote prediction from all the trees. My approach is to build a matrix that has each decision tree's prediction in a separate column, and then for every row (corresponding to each data point), finding the modal value to make the final prediction for that data point.

So far my function is:

def majority_classify(x_train, y_train, x_test, y_test, num_samples):

n = x_train.shape[0]
c=len(np.unique(y_train))

votes=np.zeros((n, c))
predictions_train=np.empty((n, num_samples+1))
predictions_test=np.empty((n, num_samples))


for i in range(0, num_samples):
    # Randomly a sample points from the train set of size 'n'
    indices = np.random.choice(np.arange(0, n), size=n)

    x_train_sample = x_train[indices, :]
    y_train_sample = y_train[indices]

    dt_major = tree.DecisionTreeClassifier(max_depth = 2)
    model_major = dt_major.fit(x_train, y_train)

    predictions_train[:,i]=model_major.predict(x_train)




for r in predictions_train:
    predict_train = mode(r)[0][0]

However, what I'm having trouble with is figuring how to iterate through each row and find the mode. Any suggestions?

Thanks!

4
  • The documentation is a good place to start. You should include a minimal example of the input, and the desired result in your question. Commented Nov 8, 2016 at 22:28
  • 1
    I'd like to iterate over each row as a single unit, not iterate over the items within each row. I don't think I'm seeing how to do that in that documentation. Commented Nov 8, 2016 at 22:32
  • docs.scipy.org/doc/numpy/user/… Commented Nov 8, 2016 at 22:46
  • Can you use any package or are you restricted? Commented Nov 9, 2016 at 0:59

2 Answers 2

1
  • use np.unique with the return_counts parameter.
  • use the argmax on the counts array to get value from unique array.
  • use np.apply_along_axis for a custom function mode

def mode(a):
    u, c = np.unique(a, return_counts=True)
    return u[c.argmax()]

a = np.array([
        [1, 2, 3],
        [2, 3, 4],
        [3, 4, 5],
        [2, 5, 6],
        [4, 1, 7],
        [5, 4, 8],
        [6, 6, 3]
    ])

np.apply_along_axis(mode, 0, a)

array([2, 4, 3])
Sign up to request clarification or add additional context in comments.

Comments

0

Check out scipy.stats.mode:

import numpy as np
from scipy.stats import mode

>>> a = np.array([[1,1,0],[1,2,2],[2,0,0]])
>>> mode(a, axis=1)[0]
array([[1],
       [2],
       [0]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.