Manual Python Implementation of Stacking Model

Question

I tried to build a Python class, CustomStackingClassifier(), to implement the Stacking method in ensemble machine learning. In this implementation, the output of the base classifiers is set to be the predicted probabilities, and StratifiedKFold is used for model training. The input matrix for the meta-classifier has dimensions (samples, models * classes).

This code essentially replicates the functionality of sklearn.ensemble.StackingClassifier() manually. However, after testing it with the wine dataset and comparing the results between the two methods, I found discrepancies. Despite spending a lot of time on it, I could not pinpoint the issue. I would greatly appreciate any help or insights from the community. Thank you so much!

I hope to clarify whether there is a logical issue with the CustomStackingClassifier(). If there is a problem, I would appreciate guidance and suggestions for corrections. If the implementation is correct, why does it show result differences compared to sklearn.ensemble.StackingClassifier()?

The code is as follows, I really need help, please:

class CustomStackingClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, base_classifiers, meta_classifier, n_splits=5):
    """
    :param base_classifiers: list of estimators
    :param meta_classifier: final_estimator
    :param n_splits: cv
    """
    self.base_classifiers = base_classifiers
    self.meta_classifier = meta_classifier
    self.n_splits = n_splits

def fit(self, X, y):
    """
    :param X: train data
    :param y: train label
    """

    n_samples = X.shape[0]
    n_classifiers = len(self.base_classifiers)
    n_classes = len(np.unique(y))  # Get the number of categories

    base_probabilities = np.zeros((n_samples, n_classifiers * n_classes))  # Used to store the predicted probabilities of the base classifier

    # Setting up cross validation by StratifiedKFold, consistent with StackingClassifier
    kf = StratifiedKFold(n_splits=self.n_splits, shuffle=False, random_state=None)

    # reset index of data
    X_re_index = X.reset_index(drop=True)
    y_re_index = y.reset_index(drop=True)

    # Train each base classifier and generate prediction probabilities
    for i, (name, clf) in enumerate(self.base_classifiers):
        fold_probabilities = np.zeros((n_samples, n_classes))

        # Train and predict for each fold
        for train_index, val_index in kf.split(X_re_index,y_re_index):
            X_train, X_val = X_re_index.iloc[train_index], X_re_index.iloc[val_index]
            y_train, y_val = y_re_index.iloc[train_index], y_re_index.iloc[val_index]

            # Train base classifier
            clf.fit(X_train, y_train)

            # Predict probabilities on validation set
            fold_probabilities[val_index] = clf.predict_proba(X_val)

        # Save the predicted probabilities of each base classifier into base_probabilities
        base_probabilities[:, i * n_classes: (i + 1) * n_classes] = fold_probabilities

    # train meta classifier
    self.meta_classifier.fit(base_probabilities, y_re_index)

    return self

def predict(self, X):
    """
    :param X: test data
    """
    # get the predicted probabilities of each base classifier
    base_probabilities = np.column_stack([clf.predict_proba(X) for name, clf in self.base_classifiers])

    # predict the label using the meta classifier
    return self.meta_classifier.predict(base_probabilities)

def predict_proba(self, X):
    """
    :param X: test data
    """
    # get the predicted probabilities of each base classifier
    base_probabilities = np.column_stack([clf.predict_proba(X) for name, clf in self.base_classifiers])

    # predict the label probabilities using the meta classifier
    return self.meta_classifier.predict_proba(base_probabilities)

base_models = [
('svm', SVC(probability=True,random_state=42)),
('knn', KNeighborsClassifier()),
('rf', RandomForestClassifier(random_state=42)),
]

meta_model = xgb.XGBClassifier(verbosity=0,random_state=42)

# 1. load wine dataset
iris = load_wine()
X = iris.data
y = pd.Series(iris.target)

# data spilt
X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.2, random_state=41)

# data preprocessing
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = pd.DataFrame(scaler.transform(X_train), 
columns=iris.feature_names)
X_test_scaled = pd.DataFrame(scaler.transform(X_test), 
columns=iris.feature_names)

# manual implementation of Stacking Model
stacking_model = CustomStackingClassifier (base_classifiers=base_models, meta_classifier=meta_model, n_splits=5)    # accuracy: 0.944

# Stacking Model method in sklearn
stacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model,cv=5,stack_method='auto',verbose=1)  # accuracy: 0.972

stacking_model.fit(X_train_scaled, y_train)

# 4. Evaluate the model
y_pred = stacking_model.predict(X_test_scaled)
print('Evaluating results of Stacking model ：')
print('accuracy：', accuracy_score(y_test, y_pred))
print('precision：', precision_score(y_test, y_pred, 
average='macro'))
print('recall：', recall_score(y_test, y_pred, average='macro'))
print('F1-score：', f1_score(y_test, y_pred, average='macro'))

As answering this question will probably take time, better make it easier for the reader by showing the code you used to test your class and the results you got when comparing both method. — rehaqds
– rehaqds, Commented Jan 2 at 8:14
Thanks for your comments, I have added the code that can be used for testing verification — CM_Li
– CM_Li, Commented Jan 2 at 9:04
It would be good to add specifically what kind of discrepancies you found between your implementation and the scikit-learn implementation. — Oxbowerce
– Oxbowerce, Commented Jan 2 at 17:33

MuhammedYunus · Accepted Answer · 2025-01-02 17:51:22Z

I think the main issue was how you were fitting the base estimators.

Your first two steps are the same as the sklearn implementation:

Fit base estimators using CV, and record the out-of-fold validation probabilities
Fit the meta estimator on those validation probabilities

You then re-use the base estimators from step 1 for the final prediction step:

For prediction, get the probabilities from the base estimators and hand them over to the meta estimator for a final prediction.

In your predict step, you would have inadvertently been invoking predict_proba on base estimators trained only on the last fold.

The sklearn implementation does it differently:

During training, the estimators are fitted on the whole training data X_train. They will be used when calling predict or predict_proba. To generalize and avoid over-fitting, the final_estimator is trained on out-samples using sklearn.model_selection.cross_val_predict internally.

So their approach is:

(Steps 1 and 2 are identical to yours)

Fit base estimators on all of the data (i.e. no CV), and then set them aside (not used again until prediction). This gives you base estimators trained on all of the data that'll be used during predict.
For prediction, get the probabilities from the base estimators and hand them over to the meta estimator for a final prediction (like your prediction step, but using base estimators trained on all of the data).

The difference to your predict step is that you would have inadvertently been using base estimators trained only on a single fold rather than the full dataset.

I amended that part, and also made other changes including some standard estimator checks, and using a trailing underscore to denote fitted attributes. You could also replace the CV loop with cross_val_predict, as done in the sklearn implementation.

Comparing the two implementations:

-------------[sklearn implementation]-------------
Evaluating results of Stacking model:
 accuracy: 0.74375
 precision: 0.7491127887469351
 recall: 0.7430302705789962
 F1-score: 0.7442490607858389

-------------[custom implementation]--------------
Evaluating results of Stacking model:
 accuracy: 0.74375
 precision: 0.7491127887469351
 recall: 0.7430302705789962
 F1-score: 0.7442490607858389

To ensure the results are identical, it would be more correct to compare the probability outputs rather than the overall model accuracies.

Modified implementation and testing

The modified custom implementation:

from sklearn.base import (
    BaseEstimator, ClassifierMixin,
    check_is_fitted, check_array, check_X_y, clone
)
from sklearn.model_selection import StratifiedKFold

class CustomStackingClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, base_classifiers, meta_classifier, n_splits=5):
        """
        :param base_classifiers: list of estimators
        :param meta_classifier: final_estimator
        :param n_splits: cv
        """
        self.base_classifiers = base_classifiers
        self.meta_classifier = meta_classifier
        self.n_splits = n_splits

    def fit(self, X, y):
        """
        :param X: train data
        :param y: train label
        """

        #
        # Input checks, convert to ndarray, and set some standard attributes
        #
        X, y = check_X_y(X, y)

        if hasattr(X, 'columns'):
            self.feature_names_in_ = np.array(X.columns, dtype='object')
        self.n_features_in_ = X.shape[1]


        n_samples = len(X)
        n_classifiers = len(self.base_classifiers)
        n_classes = len(np.unique(y))  # Get the number of categories

       # Used to store the predicted probabilities of the base classifier
        base_probabilities = np.zeros((n_samples, n_classifiers, n_classes))

        # Setting up cross validation by StratifiedKFold, consistent with StackingClassifier
        kf = StratifiedKFold(n_splits=self.n_splits)

        # Train each base classifier and generate prediction probabilities
        self.base_classifiers_ = [(name, clone(clf)) for name, clf in self.base_classifiers]

        #Base classifiers are fitted on the whole of X
        # These are fit, and then set aside until .predict/.predict_proba
        [clf.fit(X, y) for (name, clf) in self.base_classifiers_]

        #final_estimator/meta_estimator/blender is
        # fitted only on out-of-fold predictions from each base classifier
        for clf_idx, (name, clf) in enumerate(self.base_classifiers):

            # Get out-of-fold probas from base classifiers
            # You could use cross_val_predict() to replace this block, as done in StackingClassifier
            for train_index, val_index in kf.split(X, y):
                X_train, y_train = [arr[train_index] for arr in (X, y)]
                X_val, _ = [arr[val_index] for arr in (X, y)]

                # Predict probabilities on validation set and store
                base_probabilities[val_index, clf_idx, :] = (
                    clone(clf)               #unfitted base classifier
                    .fit(X_train, y_train)   #fit on train split
                    .predict_proba(X_val)    #get the out-of-fold probas
                )

        # train meta classifier on (X=out-of-fold probas, y)
        self.meta_classifier_ = clone(self.meta_classifier).fit(
            base_probabilities.reshape(-1, n_classifiers * n_classes),
            y
        )

        return self
    
    def get_base_probas(self, X):
        return np.column_stack([
            clf.predict_proba(X) for name, clf in self.base_classifiers_
        ])

    def predict(self, X):
        """
        :param X: test data
        """
        check_is_fitted(self)
        X = check_array(X)

        # get the predicted probabilities of each base classifier
        base_probabilities = self.get_base_probas(X)

        # predict the label using the meta classifier
        return self.meta_classifier_.predict(base_probabilities)

    def predict_proba(self, X):
        """
        :param X: test data
        """
        check_is_fitted(self)
        X = check_array(X)

        # get the predicted probabilities of each base classifier
        base_probabilities = self.get_base_probas(X)

        # predict the label probabilities using the meta classifier
        return self.meta_classifier_.predict_proba(base_probabilities)

Comparing scores as an initial test:

from sklearn.svm import SVC
from sklearn.ensemble import (
    RandomForestClassifier,
    AdaBoostClassifier,
    StackingClassifier
)
import xgboost as xgb

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score
)

#
# Dataset for testing
#
import pandas as pd
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=800, n_classes=3, n_informative=3, random_state=0
)
X = pd.DataFrame(X)
y = pd.Series(y)

X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=0
)

# data preprocessing
scaler = StandardScaler().set_output(transform='pandas').fit(X_train) #set to pandas output
X_train_scaled, X_val_scaled = [scaler.transform(x) for x in [X_train, X_val]]

#
# Base models
#
base_models = [
    #Use regularised models so the accuracy etc is < 100%
    # Makes it easier to see if implementation are the same
    ('svm', SVC(probability=True, C=0.01, random_state=0)),
    ('adab', AdaBoostClassifier(random_state=0, n_estimators=3)),
    ('rf', RandomForestClassifier(max_depth=1, random_state=0)),
]

meta_model = xgb.XGBClassifier(verbosity=0, random_state=0)

#
# Evaluate StackingClassifier and custom class
#
for use_custom in [False, True]:
    if not use_custom:
        print('[sklearn implementation]'.center(50, '-'))
        # Stacking Model method in sklearn
        stacking_model = StackingClassifier(
            estimators=base_models,
            final_estimator=meta_model,
            cv=5,
            stack_method='predict_proba',
        )
    else:
        print('[custom implementation]'.center(50, '-'))
        # manual implementation of Stacking Model
        stacking_model = CustomStackingClassifier(
            base_classifiers=base_models,
            meta_classifier=meta_model,
            n_splits=5
        )

    #Fit the selected implementation
    stacking_model.fit(X_train_scaled, y_train)

    # 4. Evaluate the model
    y_pred = stacking_model.predict(X_val_scaled)

    print('Evaluating results of Stacking model:')
    print(' accuracy:', accuracy_score(y_val, y_pred))
    print(' precision:', precision_score(y_val, y_pred, average='macro'))
    print(' recall:', recall_score(y_val, y_pred, average='macro'))
    print(' F1-score:', f1_score(y_val, y_pred, average='macro'))
    print()

Thank you for your valuable feedback. I have tested the code on multiple datasets, and both implementation methods (custom and sklearn) achieve consistent outputs. I appreciate you taking the time to respond, and wish you all the best in life！ — CM_Li
– CM_Li, Commented Jan 3 at 2:23
My pleasure @CM_Li, I'm glad it's working as needed. Thanks for your kind words. I wish you all the best too👍 — MuhammedYunus
– MuhammedYunus, Commented Jan 3 at 13:26

Stack Exchange Network

Manual Python Implementation of Stacking Model

1 Answer 1

Modified implementation and testing

Your Answer

Hot Network Questions

Manual Python Implementation of Stacking Model

1 Answer 1

Modified implementation and testing

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions