0

I am trying to build a bar chart with the bars shown in a descending order.

In my code, the numpy array is a result of using SelectKmeans() to select the best features in a machine learning problem depending on their variance.

import numpy as np
import matplotlib.pyplot as plt 

flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']

fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
  8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ]) # this is the numpy.ndarray after running SelectKBest()

print(fimportance) # this gives me 'int_rate', 'fico', 'revol_util', 'inq_last_6mths'  as 4 most #important features as their variance values are mapped to flist, e.g. 250 relates to'int_rate' and 218 relates to 'inq_last_6mths'.
[250.14120228  23.95686725  10.71979245  13.38566487 219.41737141
  8.19261323  27.69341779  64.96469182 218.77495366  22.7037686 ]

So I want to show these values on my bar chart in descending order, with int_rate on top.

fimportance_sorted = np.sort(fimportance)  
fimportance_sorted

array([250.14120228, 219.41737141, 218.77495366,  64.96469182,
        27.69341779,  23.95686725,  22.7037686 ,  13.38566487,
        10.71979245,   8.19261323])

#  this bar chart is not right because here the values and indices are messed up.
plt.barh(flist, fimportance_sorted)
plt.show()

enter image description here

Next I have tried this.

plt.barh([x for x in range(len(fimportance))], fimportance)

enter image description here

I understand I need to map these indices to the flist values somehow and then sort them. Maybe by creating an array and then mapping my list labels instead of its index. here I am stuck.

for i,v in enumerate(fimportance):
    arr = np.array([i,v])

.....

Thank you for your help with this problem.

1 Answer 1

1

the values and indices are messed up

That's because you sorted fimportance (fimportance_sorted = np.sort(fimportance)), but the order of labels in flist remained unchanged, so now labels don't correspond to the values in fimportance_sorted.

You can use numpy.argsort to get the indices that would put fimportance into sorted order and then index both flist and fimportance with these indices:

>>> import numpy as np
>>> flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
>>> fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
...   8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ])
>>> idx = np.argsort(fimportance)
>>> idx
array([5, 2, 3, 9, 1, 6, 7, 8, 4, 0])
>>> flist[idx]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index
>>> np.array(flist)[idx]
array(['days_with_cr_line', 'log_annual_inc', 'dti', 'pub_rec',
       'installment', 'revol_bal', 'revol_util', 'inq_last_6mths', 'fico',
       'int_rate'], dtype='<U17')
>>> fimportance[idx]
array([  8.19261323,  10.71979245,  13.38566487,  22.7037686 ,
        23.95686725,  27.69341779,  64.96469182, 218.77495366,
       219.41737141, 250.14120228])

idx is the order in which you need to put elements of fimportance to sort it. The order of flist must match the order of fimportance, so index both with idx.

As a result, elements of np.array(flist)[idx] correspond to elements of fimportance[idx].

Sign up to request clarification or add additional context in comments.

1 Comment

thank you. it works : ) I cant accept your answer yet as too soon, will do later.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.