How to replace column values based on other columns in pandas?

Question

Assume, I have a data frame such as

import pandas as pd 
df = pd.DataFrame({'visitor':['A','B','C','D','E'],
                   'col1':[1,2,3,4,5],
                   'col2':[1,2,4,7,8],
                   'col3':[4,2,3,6,1]})

visitor	col1	col2	col3
A	1	1	4
B	2	2	2
C	3	4	3
D	4	7	6
E	5	8	1

For each row/visitor, (1) First, if there are any identical values, I would like to keep the 1st value of each row then replace the rest of identical values in the same row with NULL such as

visitor	col1	col2	col3
A	1	NULL	4
B	2	NULL	NULL
C	3	4	NULL
D	4	7	6
E	5	8	1

Then (2) keep rows/visitors with more than 1 value such as
Final Data Frame

visitor	col1	col2	col3
A	1	NULL	4
C	3	4	NULL
D	4	7	6
E	5	8	1

Any suggestions? many thanks

Shubham Sharma · Accepted Answer · 2022-02-24 04:50:22Z

1

We can use series.duplicated along the columns axis to identify the duplicates, then mask the duplicates using where and filter the rows where the sum of non-duplicated values is greater than 1

s = df.set_index('visitor')
m = ~s.apply(pd.Series.duplicated, axis=1)

s.where(m)[m.sum(1).gt(1)]

         col1  col2  col3
visitor                  
A           1   NaN   4.0
C           3   4.0   NaN
D           4   7.0   6.0
E           5   8.0   1.0

answered Feb 24, 2022 at 4:50

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

DaCard Over a year ago

hi Sharma, Can you kindly explain what ~s. is? This is my 1st time see this. Don't know what it means in pandas. many thanks

Shubham Sharma Over a year ago

@DaCard ~ is the negation operator so basically ~(s.apply(pd.Series.duplicated, axis=1)) will help you identify the non-duplicate values per row.

DaCard Over a year ago

thank you for your quick response. Appreciate!

BENY · Accepted Answer · 2022-02-24 04:51:10Z

1

Let us try mask with pd.Series.duplicated, then dropna with thresh

out = df.mask(df.apply(pd.Series.duplicated,1)).dropna(thresh = df.shape[1]-1)
Out[321]: 
  visitor  col1  col2  col3
0       A     1   NaN   4.0
2       C     3   4.0   NaN
3       D     4   7.0   6.0
4       E     5   8.0   1.0

answered Feb 24, 2022 at 4:51

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

How to replace column values based on other columns in pandas?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related