Get Pandas Dataframe Column names from Numpy Array

Question

I have a dataframe imported from excel:

>>df

    Name Emp ID  Total Salary     A      B     C     D      E
0   Mike   A001         25000  5000  15000  3000     0   2000
1   John   A002         23000  5000  10000  3000  3000   2000
2    Bob   A003         21000  5000  15000     0  1000      0
3   Rose   A004         20000  5000  10000  2000  1000  20000
4  James   A005         10000  5000      0  3000     0   2000

Now I have calculated the sum of subset of Total Salary using the following code:

Code:

import pandas as pd
import numpy as np

df = pd.read_excel('tmp/test.xlsx')
val = df.drop(['Name','Emp ID','Total Salary'],1)
test = np.array(val)

num = df['Total Salary'][0]
array = test[0]

def subsetsum(array,num):
    if num == 0 or num < 1:
        return None
    elif len(array) == 0:
        return None
    else:
        if np.isclose(array[0],num):
            return [array[0]]
    else:
        with_v = subsetsum(array[1:],(num - array[0])) 
        if with_v:
            return [array[0]] + with_v
        else:
            return subsetsum(array[1:],num)

print('\nValues : ',array)
print('\nTotal Salary : ',num)
print('\nValues of Salary : ',subsetsum(array,num))

Output:

Values :  [ 5000 15000  3000     0  2000]

Total Salary :  25000

Values of Salary :  [5000, 15000, 3000, 0, 2000]

Now I need a way to link the values of salary present in the array to the column names present in data frame.

So my output that I would like would be:

Output Required:

Values :  [ 5000 15000  3000     0  2000]

Total Salary :  25000

Values of Salary :  A - 5000 B - 15000 C - 3000 E - 2000

David Z · Accepted Answer · 2016-12-28 07:41:40Z

1

I would suggest rewriting your subsetsum function to return the indices of the chosen elements, rather than the elements themselves (or perhaps it could return both, if that works out to be better for you). For example,

subsetsum([5000, 15000, 3000, 0, 2000], 25000)

would return [0, 1, 2, 3, 4], or possibly [0, 1, 2, 4]. Then you can use these indices to access the corresponding column labels as well as the elements.

answered Dec 28, 2016 at 7:41

David Z

133k29 gold badges264 silver badges284 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

cgmaster Over a year ago

I have tried around but have failed, could you please guide me how to do it. Please..it would be a great help!!

David Z Over a year ago

@cgmaster What have you tried, and why did it fail?

cgmaster Over a year ago

I am unable to extract index values from the function. When I try the to extract the values individually so that I can get the index, it throws me None [2000] [3000, 2000] [15000, 3000, 2000].

David Z Over a year ago

@cgmaster Honestly, that doesn't help me help you at all. I'm not sure what you mean by "extract index values from the function".

Trung Hoang · Accepted Answer · 2017-04-27 10:19:21Z

1

With all your provided info, I check it on my own machine. The easiest way to convert a data.frame to a numpy array:

test = val.values
array = test[0]

You can always have access to column names

col = val.columns.values

Finally, match the names with values

link = list(zip(col, subsetsum(array,num)))
print(link)

# Output
[('A', 5000), ('B', 15000), ('C', 3000), ('D', 0), ('E', 2000)]

The zip() will match 2 arrays with the same length, and return a zip object. Then if you want to iterate and using print, first convert to list(). I hope this help!

answered Apr 27, 2017 at 10:19

Trung Hoang

112 bronze badges

Collectives™ on Stack Overflow

Get Pandas Dataframe Column names from Numpy Array

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related