0

I have a dataframe like this structure (in the real one there are more columns Game x, around 30, but for explaining I think it's ok with these 2 columns):

      Name         Game 1            Game 2
0     Player 1     Starting 68       Starting
1     Player 2     Bench 74          Starting 80
2     Player 3     Starting          Bench
3     Player 4     Bench             Bench 50
4     Player 5     NaN               Starting

I need new columns for counting the minutes of any player in the columns "Game x" based in these conditions:

  • Starting: means the player has played 90 minutes
  • Starting 68 (or whichever): means the player has played 68 minutes (or whichever)
  • Bench and NaN: means the player has played 0 minutes
  • Bench 74 (or whichever): means the player has played 16 minutes (the total is 90 so he started at the minute 74 and then is 90 - 74 = 16)

There would be 2 columns counting the number of the minutes the player has played when he started the game and when he entered the game from the bench.

The final dataframe would be:

      Name         Game 1           Game 2           Minutes Starting   Minutes Bench
0     Player 1     Starting 68      Starting         158                0
1     Player 2     Bench 74         Starting 80      80                 16
2     Player 3     Starting         Bench            90                 0
3     Player 4     Bench            Bench 50         0                  40
4     Player 5     NaN              Starting 60      60                 0  

1 Answer 1

1

If you write a function that parses a text field and returns the corresponding number of minutes, you can apply that function to each game column and add up the results. For example, the time played from start:

def played_from_start(entry):
    entry = str(entry)  # Without this, np.nan is a float.
    if entry == 'nan' or entry == '':
        return 0
    if entry.startswith('Bench'):
        return 0
    if entry == 'Starting':
        return 90
    if entry.startswith('Starting'):
        return int(entry[9:])
    print(f"Warning: Entry '{entry}' not recognized.")
    return np.nan


games = ['Game 1', 'Game 2']

df['Minutes Starting'] = np.sum(np.array([df[game].apply(played_from_start).values
                                          for game in games]),
                                axis=0)
Sign up to request clarification or add additional context in comments.

8 Comments

I get the idea, but I receive an error: KeyError: 'Game 1'
Also, I was assuming the name of your dataframe is df. You may need to substitute the actual name.
That error must result from the case if entry.startswith('Starting'): return int(entry[9:]). My assumption here was that if the entry starts with 'Starting', it is followed by a space and then the number of minutes, as in the example data you provided. The error indicates that in your actual data, at least one entry does not follow this format.
Great! It's because of that, there is an exceptional case of a Player with 'Starting 66 66' (I don't know why it's like this but I will try to fix it). Thanks a lot!
Note that 'Bench' has three characters less than 'Starting', so instead of int(entry[9:]), you will need int(entry[6:]) this time.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.