0

Aim: To plot the feature importance

Error 1: AttributeError: 'DataFrame' object has no attribute 'source'

Error 2: KeyError: 'source'

Where?: names = [data.source[i] for i in indices] OR names = [data['source'] == i for i in indices]

I am not an expert in python and pandas, could you please help me correct this chunk of code? And also if an advice over syntax to avoid similar errors in future?

Code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('data_with_anomalies.csv')
source = pd.DataFrame(data)
target = data['Anomaly']
source = source.drop(columns = ['Anomaly_Tag'])

model = ExtraTreesClassifier()
model.fit(source, target)
print(model.feature_importances_)

importances = model.feature_importances_

# Below chunk is referred from another question on stackoverflow
# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]

Getting error 1 with below line:

# Rearrange feature names so they match the sorted feature importances
names = [data.source[i] for i in indices]

OR if I change it to below, I get error 2:

names = [data['source'] == i for i in indices]

plt.figure()
plt.title("Feature Importance")
plt.bar(range(source.shape[1]), importances[indices])
plt.xticks(range(source.shape[1]), names, rotation=90)
plt.show()
1
  • In both cases you are essentially attempting to access a column of the data frame named 'source', which doesn't exist. Commented Mar 1, 2020 at 21:24

1 Answer 1

1

Try:

names = data.reindex(indices)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.