Assume, I have a data frame such as
import pandas as pd
df = pd.DataFrame({'visitor':['A','B','C','D','E'],
'col1':[1,2,3,4,5],
'col2':[1,2,4,7,8],
'col3':[4,2,3,6,1]})
| visitor | col1 | col2 | col3 |
|---|---|---|---|
| A | 1 | 1 | 4 |
| B | 2 | 2 | 2 |
| C | 3 | 4 | 3 |
| D | 4 | 7 | 6 |
| E | 5 | 8 | 1 |
For each row/visitor, (1) First, if there are any identical values, I would like to keep the 1st value of each row then replace the rest of identical values in the same row with NULL such as
| visitor | col1 | col2 | col3 |
|---|---|---|---|
| A | 1 | NULL | 4 |
| B | 2 | NULL | NULL |
| C | 3 | 4 | NULL |
| D | 4 | 7 | 6 |
| E | 5 | 8 | 1 |
Then (2) keep rows/visitors with more than 1 value such as
Final Data Frame
| visitor | col1 | col2 | col3 |
|---|---|---|---|
| A | 1 | NULL | 4 |
| C | 3 | 4 | NULL |
| D | 4 | 7 | 6 |
| E | 5 | 8 | 1 |
Any suggestions? many thanks