0
1 1 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1
1 0 0 1 1 0 0 0 0

I have a dataframe of the above structure. I want to get the columns which has a column sum of 1. The columns should be combined together if they have 1's for the same row. So if we see the above example we should get column [3],[4,5],[6],[7,8,9] as output. I tried doingdf.columns[df.sum(axis=0) == 1] but instead of getting them in group(when they have same row's) I am getting them as individual...

2 Answers 2

2

You can create a sub_df where column sums are 1:

sub_df = df.loc[:, df.sum()==1]

sub_df
Out[105]: 
   2  3  4  5  6  7  8
0  0  0  0  1  0  0  0
1  0  0  0  0  0  0  0
2  0  0  0  0  0  0  0
3  1  0  0  0  0  0  0
4  0  0  0  0  1  1  1
5  0  1  1  0  0  0  0

And then group those columns by the position of 1's (position of the max):

sub_df.groupby(sub_df.idxmax(), axis = 1).groups
Out[107]: {0: [5], 3: [2], 4: [6, 7, 8], 5: [3, 4]}

The result is a dictionary. You can access the values by dict.values():

d = sub_df.groupby(sub_df.idxmax(), axis = 1).groups
d.values()
Out[110]: dict_values([[5], [2], [6, 7, 8], [3, 4]])

The column names in my example were zero-based numbers. You can iterate over the dictionary to add 1 to those values.

Sign up to request clarification or add additional context in comments.

Comments

1

Solution

s = df.loc[:, df.sum(axis=0) == 1].idxmax(axis=0)

[[int(j) for j in i] for i in s.groupby(s).groups.values()]

Looks like:

[[5], [2], [6, 7, 8], [3, 4]]

EDIT:

This is essentially the same exact answer as ayhan. I posted maybe 2 seconds after he/she did. I'm leaving mine here because I handled converting long int to int. Please choose his/her answer over mine.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.