Find duplicate values across rows, Python

Question

I would like to find duplicate values across rows. e.g. row 1 has 3 duplicates (A). Keep the first value (or keep any one of them), and replace the other duplicate values with nan

	col1	col2	col3.	col4
1	A	A	A	Y
2	B	D	G	L
3	E	F	T	K

data = {'col1':['A', 'B', 'E'],
        'col2':['A', 'D', 'F'],
        'col3':['A', 'G', 'T'],
        'col4':['Y', 'L', 'K']}
  
# Create DataFrame
df = pd.DataFrame(data)

Thank you.

Please when asking about dataframe, provide python code to reproduce your problem exactly, provide the DataFrame construction with data, so we can do it without writing by ourself. Also show the expected output exactly, that avoid failing to understand the text ;) — azro
– azro, Commented Apr 13, 2021 at 10:11

Ynjxsjmh · Accepted Answer · 2021-04-13 10:13:46Z

1

Use pandas.DataFrame.transpose() then check duplicate on each column.

df_ = df.T

for col in df_.columns:
    duplicated = df_.duplicated(col)
    df_.loc[duplicated, col] = np.NaN

# print(df_.T)

  col1 col2 col3
0    A  NaN  NaN
1    B    D    G
2    E    F    T

answered Apr 13, 2021 at 10:13

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Find duplicate values across rows, Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related