0

I am analyzing a dataset containing NFL game results over the past 20 years and am trying to create a column denoting for each team whether or not the game was a home game or away game (home game = 1, away game = 0).

The code I have so far is:

home_list = list(df.home_team.unique())
def home_or_away(team_name, dataf):
   dataf['home_or_away'] = np.where(dataf['home_team'] == team_name, 1, 0)
   return dataf

for i in home_list:
   home_update_all = home_or_away(i, df)
   df.update(home_update_all)

This doesn't seem to yield the correct results as each team is just overwritten when iterating over them. Any ideas on how to solve this?

Thanks!

2
  • 1
    Please edit your post so that it contains an MRE. Commented Jun 5, 2022 at 23:05
  • give a sample of your dataset. Commented Jun 6, 2022 at 9:26

2 Answers 2

1

Not really sure what your expected output is. Do you mean you want one column per team? You currently keep creating columns but with the same name so always only the one in the last iteration will be kept, the rest overwritten. Or do you want multiple DataFrames?

If you want multiple columns, one per team:

import pandas as pd

df = pd.DataFrame({'game': [1, 2, 3, 4], 'home_team': ['a', 'b', 'c', 'a']})
>    game home_team
  0     1         a
  1     2         b
  2     3         c
  3     4         a

First collect unique teams as you did:

home_list = list(df.home_team.unique())

Create a column for each team:

for team in home_list:
    df[f'home_or_away_{team}'] = [int(ht==team) for ht in df['home_team']]

Which results in:

>   game home_team  home_or_away_a  home_or_away_b  home_or_away_c
 0     1         a               1               0               0
 1     2         b               0               1               0
 2     3         c               0               0               1
 3     4         a               1               0               0
Sign up to request clarification or add additional context in comments.

Comments

0

You're over complicating it. Don't need to iterate with numpy .where(). Just use the np.where() on the 2 columns (not with a separate function).

Basically says "where home_team equals team_name, put a 1, else put 0"

import pandas as pd
import numpy as np

df = pd.DataFrame([['Chicago Bears','Chicago Bears', 'Green Bay Packers'],
                   ['Chicago Bears','Green Bay Packers', 'Chicago Bears'],
                   ['Detriot Lions','Detriot Lions', 'Los Angeles Chargers'],
                   ['New England Patriots','New York Jets', 'New England Patriots'],
                   ['Houston Texans','Los Angeles Rams', 'Houston Texans']], 
                  columns = ['team_name','home_team','away_team'])


df['home_or_away'] = np.where(df['home_team'] == df['team_name'], 1, 0)

Output:

print(df)
              team_name          home_team             away_team  home_or_away
0         Chicago Bears      Chicago Bears     Green Bay Packers             1
1         Chicago Bears  Green Bay Packers         Chicago Bears             0
2         Detriot Lions      Detriot Lions  Los Angeles Chargers             1
3  New England Patriots      New York Jets  New England Patriots             0
4        Houston Texans   Los Angeles Rams        Houston Texans             0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.