0

I need to create a new column in a csv called BTTS, which is based on two other columns, FTHG and FTAG. If FTHG & FTAG are both greater than zero, BTTS should be 1. Otherwise it should be zero.

What's the best way to do this in pandas / numpys?

2 Answers 2

1

I'm not sure, what the best way is. But here is one solution using pandas loc method:

df.loc[((df['FTHG'] > 0) & (df['FTAG'] > 0)),'BTTS'] = 1
df['BTTS'].fillna(0, inplace=True)

Another solution using pandas apply method:

def check_greater_zero(row):
    return 1 if row['FTHG'] > 0 & row['FTAG'] > 0 else 0

df['BTTS'] = df.apply(check_greater_zero, axis=1)

EDIT:

As stated in the comments, the first, vectorized, implementation is more efficient.

Sign up to request clarification or add additional context in comments.

2 Comments

your answer contains a fallacy, apply is not more efficient, in fact it is less efficient because it is the same as looping over the dataframe. you are better off using a vectorized approach like you have listed in your first snippet
You are right! I was testing my code with just a small example in which case the second approach was more efficient than the first approach. But as it was just a very small example I agree: The bigger the DataFrame, the more efficient the first approach is.
0

I dont know if this is the best way to do it but this works :)

df['BTTS'] = [1 if x == y == 1 else 0 for x, y in zip(df['FTAG'], df['FTHG'])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.