New column adding values of different columns with strings and numbers

Question

I have a dataframe like this structure (in the real one there are more columns Game x, around 30, but for explaining I think it's ok with these 2 columns):

      Name         Game 1            Game 2
0     Player 1     Starting 68       Starting
1     Player 2     Bench 74          Starting 80
2     Player 3     Starting          Bench
3     Player 4     Bench             Bench 50
4     Player 5     NaN               Starting

I need new columns for counting the minutes of any player in the columns "Game x" based in these conditions:

Starting: means the player has played 90 minutes
Starting 68 (or whichever): means the player has played 68 minutes (or whichever)
Bench and NaN: means the player has played 0 minutes
Bench 74 (or whichever): means the player has played 16 minutes (the total is 90 so he started at the minute 74 and then is 90 - 74 = 16)

There would be 2 columns counting the number of the minutes the player has played when he started the game and when he entered the game from the bench.

The final dataframe would be:

      Name         Game 1           Game 2           Minutes Starting   Minutes Bench
0     Player 1     Starting 68      Starting         158                0
1     Player 2     Bench 74         Starting 80      80                 16
2     Player 3     Starting         Bench            90                 0
3     Player 4     Bench            Bench 50         0                  40
4     Player 5     NaN              Starting 60      60                 0

Arne · Accepted Answer · 2022-04-17 21:27:45Z

1

If you write a function that parses a text field and returns the corresponding number of minutes, you can apply that function to each game column and add up the results. For example, the time played from start:

def played_from_start(entry):
    entry = str(entry)  # Without this, np.nan is a float.
    if entry == 'nan' or entry == '':
        return 0
    if entry.startswith('Bench'):
        return 0
    if entry == 'Starting':
        return 90
    if entry.startswith('Starting'):
        return int(entry[9:])
    print(f"Warning: Entry '{entry}' not recognized.")
    return np.nan


games = ['Game 1', 'Game 2']

df['Minutes Starting'] = np.sum(np.array([df[game].apply(played_from_start).values
                                          for game in games]),
                                axis=0)

answered Apr 17, 2022 at 21:27

Arne

10.6k2 gold badges22 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

nokvk Over a year ago

I get the idea, but I receive an error: KeyError: 'Game 1'

Arne Over a year ago

Also, I was assuming the name of your dataframe is df. You may need to substitute the actual name.

Arne Over a year ago

That error must result from the case if entry.startswith('Starting'): return int(entry[9:]). My assumption here was that if the entry starts with 'Starting', it is followed by a space and then the number of minutes, as in the example data you provided. The error indicates that in your actual data, at least one entry does not follow this format.

nokvk Over a year ago

Great! It's because of that, there is an exceptional case of a Player with 'Starting 66 66' (I don't know why it's like this but I will try to fix it). Thanks a lot!

Arne Over a year ago

Note that 'Bench' has three characters less than 'Starting', so instead of int(entry[9:]), you will need int(entry[6:]) this time.

|

Collectives™ on Stack Overflow

New column adding values of different columns with strings and numbers

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related