Finding duplicates between columns using pandas

Question

I have a dataset in pandas that consists of nba game statistics.

The data looks something like this:

Date|Team 1|Team 2|Team1 Stats|...|Team2 Stats|...

Because of the way I scraped the data I have now two instances of each game, where the stats for each team are simply mirrored.

(All entries are equal just Team1 stats are in Team2 columns, and vice versa)

How do I find and remove the duplicate entries using pandas.

You could simply remove every row for which Team 1 < Team 2. — Eric Duminil
– Eric Duminil, Commented Nov 13, 2017 at 19:09

WNG · Accepted Answer · 2017-11-13 19:10:49Z

1

To remove duplicates, you can keep only cases where Team1 is before Team2 lexicographically.

dfFiltered=df[df["Team1"]<df["Team2"]]

Assuming that a team never plays itself, this will work

answered Nov 13, 2017 at 19:10

WNG

3,8153 gold badges25 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

WOW, that is a beautiful solution that works perfectly! I can not stress how cool I find this!