0

I have a bunch of rows of data in a pandas DF that contain inconsistently offsetting string characters. For each Game ID (another column), the two string characters are unique to that Game ID, but do not switch off in a predicatble pattern. Regardless, I'm trying to write a helper function that takes each unique game ID and gets the two team names associated with it.

For example...

index game_id 0 400827888 1 400827888 2 400827888 3 400827888 4 400827888 ... 555622 400829117 555623 400829117 555624 400829117 555625 400829117

index team 0 ATL 1 DET 2 ATL 3 DET 4 ATL ... 555622 POR 555623 DEN 555624 POR 555625 POR

Here is my woeful attempt, which is not working.

def get_teams(df):
    for i in df['gameid']:
        both_teams = [df['team'].astype(str)]
        return(both_teams)

I'd like it to return ['ATL', 'DET] for Game ID 400827888 and ['POR', 'DEN'] for Game ID 400829117. Instead, it is just returning the team name associated with each index.

1 Answer 1

2

You can use SeriesGroupBy.unique:

print (df.groupby('game_id')['team'].unique())
game_id
400827888    [ATL, DET]
400829117    [POR, DEN]
Name: team, dtype: object

For looping use iterrows:

for i, g in df.groupby('game_id')['team'].unique().reset_index().iterrows():
    print (g.game_id)
    print (g.team)

EDIT:

If need find all game_id by some string (e.g. DET) use boolean indexing:

s = df.groupby('game_id')['team'].unique()

print (s[s.apply(lambda x: 'DET' in x)].index.tolist())
[400827888] 
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks for this. What's the best way to iterate through that second column of team lists, then? I've initialized the groupby to a new variable, but can't call the column from that variable.
def get_teams(df, team): for game_id in df['gameid']: both_teams = df.groupby('gameid')['team'].unique() team_games = [] for row in both_teams: if team in row[1]: team_games.append(game_id) Seems to be an infinite loop, for some reason.
Sorry, do you need for g in df.groupby('game_id')['team'].unique(): print (g) ?
No, my fault for the confusion. What I mean is: when I'm iterating through the "(df.groupby('game_id')['team'].unique())" object (once I've initialized it), how do I reference back to the game_id? Say I'm looping through every line in that grouped object... I would like to return the game IDs that fit the criteria of my loop.
I add solution to answer, please check it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.