Iterate through rows of numpy array to find mode

Question

I'm trying to create a decision tree classifier function that will build an ensemble of decision trees and make the final prediction based on the majority vote prediction from all the trees. My approach is to build a matrix that has each decision tree's prediction in a separate column, and then for every row (corresponding to each data point), finding the modal value to make the final prediction for that data point.

So far my function is:

def majority_classify(x_train, y_train, x_test, y_test, num_samples):

n = x_train.shape[0]
c=len(np.unique(y_train))

votes=np.zeros((n, c))
predictions_train=np.empty((n, num_samples+1))
predictions_test=np.empty((n, num_samples))


for i in range(0, num_samples):
    # Randomly a sample points from the train set of size 'n'
    indices = np.random.choice(np.arange(0, n), size=n)

    x_train_sample = x_train[indices, :]
    y_train_sample = y_train[indices]

    dt_major = tree.DecisionTreeClassifier(max_depth = 2)
    model_major = dt_major.fit(x_train, y_train)

    predictions_train[:,i]=model_major.predict(x_train)




for r in predictions_train:
    predict_train = mode(r)[0][0]

However, what I'm having trouble with is figuring how to iterate through each row and find the mode. Any suggestions?

Thanks!

The documentation is a good place to start. You should include a minimal example of the input, and the desired result in your question. — wwii
– wwii, Commented Nov 8, 2016 at 22:28
I'd like to iterate over each row as a single unit, not iterate over the items within each row. I don't think I'm seeing how to do that in that documentation. — yogz123
– yogz123, Commented Nov 8, 2016 at 22:32

piRSquared · Accepted Answer · 2016-11-09 06:16:59Z

1

use np.unique with the return_counts parameter.
use the argmax on the counts array to get value from unique array.
use np.apply_along_axis for a custom function mode

def mode(a):
    u, c = np.unique(a, return_counts=True)
    return u[c.argmax()]

a = np.array([
        [1, 2, 3],
        [2, 3, 4],
        [3, 4, 5],
        [2, 5, 6],
        [4, 1, 7],
        [5, 4, 8],
        [6, 6, 3]
    ])

np.apply_along_axis(mode, 0, a)

array([2, 4, 3])

answered Nov 9, 2016 at 6:16

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

James · Accepted Answer · 2016-11-09 01:03:36Z

0

Check out scipy.stats.mode:

import numpy as np
from scipy.stats import mode

>>> a = np.array([[1,1,0],[1,2,2],[2,0,0]])
>>> mode(a, axis=1)[0]
array([[1],
       [2],
       [0]])

answered Nov 9, 2016 at 1:03

James

37k4 gold badges54 silver badges79 bronze badges

Collectives™ on Stack Overflow

Iterate through rows of numpy array to find mode

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related