I need to create a new column in a csv called BTTS, which is based on two other columns, FTHG and FTAG. If FTHG & FTAG are both greater than zero, BTTS should be 1. Otherwise it should be zero.
What's the best way to do this in pandas / numpys?
I'm not sure, what the best way is. But here is one solution using pandas loc method:
df.loc[((df['FTHG'] > 0) & (df['FTAG'] > 0)),'BTTS'] = 1
df['BTTS'].fillna(0, inplace=True)
Another solution using pandas apply method:
def check_greater_zero(row):
return 1 if row['FTHG'] > 0 & row['FTAG'] > 0 else 0
df['BTTS'] = df.apply(check_greater_zero, axis=1)
EDIT:
As stated in the comments, the first, vectorized, implementation is more efficient.
apply is not more efficient, in fact it is less efficient because it is the same as looping over the dataframe. you are better off using a vectorized approach like you have listed in your first snippet