Python classification define feature importance

Question

I am wondering if it is possbile to define feature importances/weights in Pyhton Classification methods? For example:

model = tree.DecisionTreeClassifier(feature_weight = ...)

I've seen in RandomForest there is an attribute feature_importance, which shows the importance of features based on analysis. But is it possible that I could define the feature importance for analysis in advance?

Thank you very much for your help in advance!

Because there's overfitting in my analysis, I can say for sure that some features are more important than the others. That is why I am wondering if I can define the importances in advance. — Sha Li
– Sha Li, Commented Jan 15, 2019 at 12:15
Possible duplicate of How to put more weight on certain features in machine learning? — Abdul Rahman Bres
– Abdul Rahman Bres, Commented Jan 15, 2019 at 12:29
In your case, I would go with Feature Selection, and keep the distinctive features only for training scikit-learn.org/stable/modules/feature_selection.html — Abdul Rahman Bres
– Abdul Rahman Bres, Commented Jan 15, 2019 at 12:32
Okay! Thanks a lot! I'll go with feature selection and remove the ones that are less important. — Sha Li
– Sha Li, Commented Jan 15, 2019 at 12:41

Romain Reboulleau · Accepted Answer · 2019-01-15 12:35:00Z

1

The feature importance determination in random forest classifiers uses a random forest-specific method (invert all binary tests over the feature, and get the additional classification error).

Feature importance is thus a concept that relates to the predictive ability of the model, not the training phase. Now, if you want to make it so that your model favours some feature over others, you will have to find some trick that depends on the model.

Regarding sklearn's DecisionTreeClassifier, such a trick does not appear to be trivial. You could custom your class weights, if you know some classes will be more easily predicted by some features that you want to favour; but this seems pretty dirty.

In other types of models, such as ones using kernels, you can do this more easily, by setting hyperparameters which directly relate to features.

If you are trying to limit an overfitting, I would also simply suggest that you remove the features you know to be less important.

answered Jan 15, 2019 at 12:35

Romain Reboulleau

3064 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sha Li Over a year ago

Thank you very much! This helps to solve my problem. :)

Collectives™ on Stack Overflow

Python classification define feature importance

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related