Python: Converting multiple binary columns to single categorical column

Question

I have a csv file dataset that contains 170 columns, the first 5 columns contain unique identifiers (Platform, ID, date, length of call, name). The remaining columns 175 contain binary data covering 10 categories. I want to condense those columns so that the number of columns in my data frame is 15. Including an example below:

import pandas as pd

df1 = pd.DataFrame({'Platform': ['Telephone', 'Chat', 'Text'], 'ID': [1, 2, 
3], 'Length': [1545,1532,1511], 'Name': ['andy', 'helen', 'peter'], 'Problem: 
A':[0,1,0], 'Problem: B':[1,0,0], 'Problem: C': [0,0,1], 'Solution: A': 
[0,1,0], 'Solution: B':[1,0,0], 'Solution: C': [0,0,1]})

The output is:

df.head()

ID  Date        Length\\
1   2015-10-16    1545
2   2015-10-09    1532
3   2015-10-13    1511 

Name Problem: A Problem: B  Problem: C  Solution: A Solution: B Solution: C
andy         0          1           0            0           1           0
helen        1          0           0            1           0           0
peter        0          0           1            0           0           1

What I want the data frame to look like:

  Platform ID Length  Name   Problem  Solution
  Telephone 1 1545    andy    B        B
  Chat      2 1532    helen   A        A
  Text      3 1511    peter   C        C

FYI this is not the full dataframe. There are a total of 170 colums that I would like to transform into 15.

Possible duplicate of stackoverflow.com/questions/26762100/… — Hubbitus
– Hubbitus, Commented Feb 13, 2019 at 12:16

cs95 · Accepted Answer · 2018-04-24 18:27:37Z

1

You can use groupby + apply with a dot product on the columns ;

df = df.set_index('Name')
df.groupby(df.columns.str.split(':').str[0], axis=1).apply(
    lambda x: x.dot(x.columns.str.split(': ').str[1])
)

      Problem Solution
Name                  
andy        B        B
helen       A        A
peter       C        C

answered Apr 24, 2018 at 18:27

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

REFER Over a year ago

When I do this I receive a "ValueError: Grouper and axis must be same length" message. I'm sure that it is something in the code that I don't understand. Would you mind telling me what each of the suggested bit of above code does? I'm familiar with df.groupby() and .apply(), but not with lambda or x.dot.

REFER Over a year ago

From what I can tell, this specifies that the column name will be the word prior to the (' : ') and the value will be the word following the (' : ').

cs95 Over a year ago

@REFER it means that what you passed df.columns.str.split(':').str[0] and the number of columns in your dataframe aren't the same. Can you look into it please?

Rohit Nandi · Accepted Answer · 2020-05-29 17:53:50Z

I created this custom function which will serve your purpose. I got the idea from this stackoverflow article

def condenseCols(data,finalCol,*cols):
    cols = list(cols)
    x = data[cols] # Slice the cols
    x = x.idxmax(axis=1) 
    # x is now a series, holding column name of the max value in the row i.e one of the column from cols
    x = x.apply(lambda s : s.split(": ")[1]) # extract only the prefix (A,B,C)

    data[finalCol] = x
    data = data.drop(cols, axis=1, inplace=True) # Drop the columns : cols
    return data

Call this method, by passing the column names you want to condense, along with the final name of column

condenseCols(df1,'Problem','Problem: A','Problem: B','Problem: C')
condenseCols(df1,'Solution','Solution: A','Solution: B','Solution: C')

There are other ways too, to do this, as cited in the article stackoverflow article

Collectives™ on Stack Overflow

Python: Converting multiple binary columns to single categorical column

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related