Repeating strings in pandas DF -- want to return list of unique strings

Question

I have a bunch of rows of data in a pandas DF that contain inconsistently offsetting string characters. For each Game ID (another column), the two string characters are unique to that Game ID, but do not switch off in a predicatble pattern. Regardless, I'm trying to write a helper function that takes each unique game ID and gets the two team names associated with it.

For example...

index game_id 0 400827888 1 400827888 2 400827888 3 400827888 4 400827888 ... 555622 400829117 555623 400829117 555624 400829117 555625 400829117

index team 0 ATL 1 DET 2 ATL 3 DET 4 ATL ... 555622 POR 555623 DEN 555624 POR 555625 POR

Here is my woeful attempt, which is not working.

def get_teams(df):
    for i in df['gameid']:
        both_teams = [df['team'].astype(str)]
        return(both_teams)

I'd like it to return ['ATL', 'DET] for Game ID 400827888 and ['POR', 'DEN'] for Game ID 400829117. Instead, it is just returning the team name associated with each index.

jezrael · Accepted Answer · 2016-07-27 21:42:31Z

2

You can use SeriesGroupBy.unique:

print (df.groupby('game_id')['team'].unique())
game_id
400827888    [ATL, DET]
400829117    [POR, DEN]
Name: team, dtype: object

For looping use iterrows:

for i, g in df.groupby('game_id')['team'].unique().reset_index().iterrows():
    print (g.game_id)
    print (g.team)

EDIT:

If need find all game_id by some string (e.g. DET) use boolean indexing:

s = df.groupby('game_id')['team'].unique()

print (s[s.apply(lambda x: 'DET' in x)].index.tolist())
[400827888]

edited Jul 27, 2016 at 21:42

answered Jul 25, 2016 at 15:27

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

BSHuniversity Over a year ago

Thanks for this. What's the best way to iterate through that second column of team lists, then? I've initialized the groupby to a new variable, but can't call the column from that variable.

BSHuniversity Over a year ago

def get_teams(df, team): for game_id in df['gameid']: both_teams = df.groupby('gameid')['team'].unique() team_games = [] for row in both_teams: if team in row[1]: team_games.append(game_id) Seems to be an infinite loop, for some reason.

jezrael Over a year ago

Sorry, do you need for g in df.groupby('game_id')['team'].unique(): print (g) ?

BSHuniversity Over a year ago

No, my fault for the confusion. What I mean is: when I'm iterating through the "(df.groupby('game_id')['team'].unique())" object (once I've initialized it), how do I reference back to the game_id? Say I'm looping through every line in that grouped object... I would like to return the game IDs that fit the criteria of my loop.

jezrael Over a year ago

I add solution to answer, please check it.

|

Collectives™ on Stack Overflow

Repeating strings in pandas DF -- want to return list of unique strings

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related