0

I have a dataset in pandas that consists of nba game statistics.

The data looks something like this:

Date|Team 1|Team 2|Team1 Stats|...|Team2 Stats|...

Because of the way I scraped the data I have now two instances of each game, where the stats for each team are simply mirrored.

(All entries are equal just Team1 stats are in Team2 columns, and vice versa)

How do I find and remove the duplicate entries using pandas.

3
  • What have you tried so far? Please post your code. Commented Nov 13, 2017 at 19:08
  • Did you try googling "drop duplicates pandas"? Commented Nov 13, 2017 at 19:08
  • You could simply remove every row for which Team 1 < Team 2. Commented Nov 13, 2017 at 19:09

1 Answer 1

1

To remove duplicates, you can keep only cases where Team1 is before Team2 lexicographically.

dfFiltered=df[df["Team1"]<df["Team2"]]

Assuming that a team never plays itself, this will work

Sign up to request clarification or add additional context in comments.

1 Comment

WOW, that is a beautiful solution that works perfectly! I can not stress how cool I find this!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.