Aim: To plot the feature importance
Error 1: AttributeError: 'DataFrame' object has no attribute 'source'
Error 2: KeyError: 'source'
Where?: names = [data.source[i] for i in indices] OR names = [data['source'] == i for i in indices]
I am not an expert in python and pandas, could you please help me correct this chunk of code? And also if an advice over syntax to avoid similar errors in future?
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv('data_with_anomalies.csv')
source = pd.DataFrame(data)
target = data['Anomaly']
source = source.drop(columns = ['Anomaly_Tag'])
model = ExtraTreesClassifier()
model.fit(source, target)
print(model.feature_importances_)
importances = model.feature_importances_
# Below chunk is referred from another question on stackoverflow
# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]
Getting error 1 with below line:
# Rearrange feature names so they match the sorted feature importances
names = [data.source[i] for i in indices]
OR if I change it to below, I get error 2:
names = [data['source'] == i for i in indices]
plt.figure()
plt.title("Feature Importance")
plt.bar(range(source.shape[1]), importances[indices])
plt.xticks(range(source.shape[1]), names, rotation=90)
plt.show()