1

I am iterating through pipeline to print out the 20 most informative features for a class called safety.

classnum_saf = 3
inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
for i in inds: 
   f = feature_names[i]
   c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
   print(f,c)
   output = {'features':f, 'coefficients':c}
   df = pd.DataFrame(output, columns = ['features', 'coefficients'])
   print(df)

I want a data frame outputted with only one header, but instead I'm returning this output which appears to repeat the header over and over again since it's iterating through [i].

   1800 [-8.73800344]
   features  coefficients
   0     1800     -8.738003
   hr [-8.73656027]
   features  coefficients
   0       hr      -8.73656
   wa [-8.7336777]
   features  coefficients
   0       wa     -8.733678
   1400 [-8.72197545]
   features  coefficients
   0     1400     -8.721975
   hrwa [-8.71952656]
   features  coefficients
   0     hrwa     -8.719527
   perimeter [-8.71173264]
   features  coefficients
   0  perimeter     -8.711733
   response [-8.67388885]
   features  coefficients
   0  response     -8.673889
   analysis [-8.65460329]
   features  coefficients
   0  analysis     -8.654603
   00 [-8.58386785]
   features  coefficients
   0       00     -8.583868
   raw [-8.56148006]
   features  coefficients
   0      raw      -8.56148
   run [-8.51374794]
   features  coefficients
   0      run     -8.513748
   factor [-8.50725691]
   features  coefficients
   0   factor     -8.507257
   200 [-8.50334896]
   features  coefficients
   0      200     -8.503349
   file [-8.39990841]
   features  coefficients
   0     file     -8.399908
   pb [-8.38173753]
   features  coefficients
   0       pb     -8.381738
   mar [-8.21304343]
   features  coefficients
   0      mar     -8.213043
   1998 [-8.21239836]
   features  coefficients
   0     1998     -8.212398
   signal [-8.02426499]
   features  coefficients
   0   signal     -8.024265
   area [-8.01782987]
   features  coefficients
   0     area      -8.01783
   98 [-7.3166918]
   features  coefficients
   0       98     -7.316692

How do I return a data frame like:

          features     coefficients
   0      1800          -8.738003
   ..     ...           ...
   18     area          -8.01783
   19     98            -7.316692

Right now when I return print(d,f), it shows the following top values:

   1800 [-8.73800344]
   hr [-8.73656027]
   wa [-8.7336777]
   1400 [-8.72197545]
   hrwa [-8.71952656]
   perimeter [-8.71173264]
   response [-8.67388885]
   analysis [-8.65460329]
   00 [-8.58386785]
   raw [-8.56148006]
   run [-8.51374794]
   factor [-8.50725691]
   200 [-8.50334896]
   file [-8.39990841]
   pb [-8.38173753]
   mar [-8.21304343]
   1998 [-8.21239836]
   signal [-8.02426499]
   area [-8.01782987]
   98 [-7.3166918]

I researched a few similar questions here, here, and here, but it doesn't seem to directly address my question.

Thank you in advance, still learning here.

1 Answer 1

1

I try simulate some data and you can append list to L in each step in loop and last create df from L:

L = []
classnum_saf = 3
inds = np.argsort(clf_3.named_steps['clf'].coef_[classnum_saf, :])[-20:]
for i in inds: 
   f = feature_names[i]
   c = clf_3.named_steps['clf'].coef_[classnum_saf, [i]]
   print(f,c)
   #add [0] for removing list of list (it works nice if len of f[i] == 1)
   L.append([c[i], f[i][0]])

df = pd.DataFrame(L, columns = ['features', 'coefficients'])
print(df) 

Sample:

import pandas as pd

f = [[1],[2],[3]]
c = ['a','b','c']

L = []
for i in range(3): 
#   print(f[i],c[i])
   #swap c and f
   L.append([c[i], f[i][0]])

print (L)
[['a', 1], ['b', 2], ['c', 3]]

df = pd.DataFrame(L, columns = ['features', 'coefficients'])
print(df)  

  features  coefficients
0        a             1
1        b             2
2        c             3
Sign up to request clarification or add additional context in comments.

2 Comments

appreciate your help! your c is a list whereas mine is a numpy.ndarray. This may explain my error upon running, "index 1169 is out of bounds for axis 0 with size 1". I assume I need to turn c into a list?
You can try, but I think it can works as well with ndarray. The best is try change f and c to ndarrays and test it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.