I'm trying to create a decision tree classifier function that will build an ensemble of decision trees and make the final prediction based on the majority vote prediction from all the trees. My approach is to build a matrix that has each decision tree's prediction in a separate column, and then for every row (corresponding to each data point), finding the modal value to make the final prediction for that data point.
So far my function is:
def majority_classify(x_train, y_train, x_test, y_test, num_samples):
n = x_train.shape[0]
c=len(np.unique(y_train))
votes=np.zeros((n, c))
predictions_train=np.empty((n, num_samples+1))
predictions_test=np.empty((n, num_samples))
for i in range(0, num_samples):
# Randomly a sample points from the train set of size 'n'
indices = np.random.choice(np.arange(0, n), size=n)
x_train_sample = x_train[indices, :]
y_train_sample = y_train[indices]
dt_major = tree.DecisionTreeClassifier(max_depth = 2)
model_major = dt_major.fit(x_train, y_train)
predictions_train[:,i]=model_major.predict(x_train)
for r in predictions_train:
predict_train = mode(r)[0][0]
However, what I'm having trouble with is figuring how to iterate through each row and find the mode. Any suggestions?
Thanks!