1

I have similar df like below,

df = pd.DataFrame({'DRINKS':['WHISKEY','VODKA','WATER'],
                    'STRONG':[5,5,0],
                    'SOUR':[5,4,0]})

And I want to transform it to this one (Read 5s from the dataframe and when it matches, create a column with whatever name(I named it Cat1) and get the column name(STRONG) where the value was 5, then move on tho the next column and do the same operation until there no columns with rows with a value 5. The final outcome should be like below:

df = pd.DataFrame({'DRINKS':['WHISKEY','VODKA','WATER'],
                    'Cat1':["STRONG","STRONG",np.nan],
                    'Cat2':["SOUR",np.nan,np.nan]})

I tried to do it with

df['Cat1']=(df == 5).idxmax(axis=1)

but it gives me only 1 column name for Whiskey.

Any help will be appreciated

3 Answers 3

1

Select and set all columns without first by DataFrame.iloc and numpy.where:

df = df.iloc[:, :1].join(pd.DataFrame(np.where(df.iloc[:, 1:].eq(5),df.columns[1:],np.nan),
                            index=df.index, 
                            columns=[f'Cat{i}' for i,_ in enumerate(df.columns[1:], 1)]))
print (df)
    DRINKS    Cat1  Cat2
0  WHISKEY  STRONG  SOUR
1    VODKA  STRONG   NaN
2    WATER     NaN   NaN
    

Or:

df.iloc[:, 1:] = np.where(df.iloc[:, 1:].eq(5), df.columns[1:], np.nan)
df = df.rename(columns=dict(zip(df.columns[1:],
                               [f'Cat{i}' for i,_ in enumerate(df.columns[1:], 1)])))
print (df)
    DRINKS    Cat1  Cat2
0  WHISKEY  STRONG  SOUR
1    VODKA  STRONG   NaN
2    WATER     NaN   NaN
Sign up to request clarification or add additional context in comments.

Comments

0

try:

df['Cat1'] = np.where(df[df.columns[1]].eq(5), df.columns[1], np.nan) #or df['Cat1'] = np.where(df["STRONG"].eq(5), "STRONG", np.nan)
df['Cat2'] = np.where(df[df.columns[2]].eq(5), df.columns[2], np.nan) #or df['Cat2'] = np.where(df["SOUR"].eq(5), "SOUR", np.nan)

    DRINKS  STRONG  SOUR    Cat1    Cat2
0   WHISKEY 5       5       STRONG  SOUR
1   VODKA   5       4       STRONG  NaN
2   WATER   0       0       NaN     NaN

df = df.drop(columns=['STRONG', 'SOUR'])

    DRINKS  Cat1    Cat2
0   WHISKEY STRONG  SOUR
1   VODKA   STRONG  NaN
2   WATER   NaN     NaN

2 Comments

Hey, thanks for the answer but i have to create cat1 and cat2 automatically
can you explain what you mean by automatically
0

You could map each column value of 5 to the column header. The core part of that would be:

df.iloc[:,1:].apply(lambda x: x.map({5:x.name}))

Which delivers:

   STRONG  SOUR
0  STRONG  SOUR
1  STRONG   NaN
2     NaN   NaN 

Then you could put it all together with a column rename:

dfo = (
    pd.concat([df['DRINKS'],df.iloc[:,1:].apply(lambda x: x.map({5:x.name}))
               .rename(columns=lambda x: f"Cat{df.columns.get_loc(x)}")], axis=1)
)

print(dfo)

Result

    DRINKS    Cat1  Cat2
0  WHISKEY  STRONG  SOUR
1    VODKA  STRONG   NaN
2    WATER     NaN   NaN

2 Comments

you could change the rename part to .rename(columns=lambda x: f"Cat{df.columns.get_loc(x)}") to get the exact result of OP.
@Rabinzel Good call. That's better. I updated my answer with that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.