Create a new pandas columns from multiple columns

Question

Here is the dataframe

    MatchId EventCodeId EventCode   Team1   Team2   Team1_Goals Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime
0   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  457040
1   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  1405394
2   865314  2053    Goal Away   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  1898705
3   865314  2053    Goal Away   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4388278
4   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4507898
5   865314  1030    Cancel Goal Home    Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4517728
6   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4956346
7   865314  1030    Cancel Goal Home    Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4960633
8   865316  2053    Goal Away   Coventry    Bradford    0   0   1.0847662440468118  1.2526705617472387  447858
9   865316  2054    Cancel Goal Away    Coventry    Bradford    0   0   1.0847662440468118  1.2526705617472387  456361

The new columns will be created as follows:

for EventCodeId = 1029 and EventCode = Goal Home
new_col1 = CurrentPlaytime/3*10**4

for EventCodeId = 2053 and ventCode = Goal Away
new_col2 = CurrentPlaytime/3*10**4

For every other EventCodeId and EventCode new_co1 and new_col2 will take 0.

Here is how I have started but couldn't go any further. please help

new_col1 = []
new_col2 = []
def timeslot(EventCodeId, EventCode, CurrentPlaytime):
    if x == 1029 and y == 'Goal Home':
        new.Col1.append(z/(3*10**4))
    elif x == 2053 and y == 'Goal Away':
        new_col2.append(z/(3*10**4))
    else:
        new_col1.append(0)
        new_col2.append(0)
    return new_col1
    return new_col2



df1['new_col1', 'new_col2'] = df1.apply(lambda x,y,z: timeslot(x['EventCodeId'], y['EventCode'], z['CurrentPlaytime']), axis=1)  

TypeError: ("<lambda>() missing 2 required positional arguments: 'y' and 'z'", 'occurred at index 0')

jpp · Accepted Answer · 2018-06-14 11:52:06Z

2

You do not need an explicit loop. Use vectorised operations where possible.

Using numpy.where:

s = df1['CurrentPlaytime']/3*10**4

mask1 = (df1['EventCodeId'] == 1029) & (df1['EventCode'] == 'Goal')
mask2 = (df1['EventCodeId'] == 2053) & (df1['EventCode'] == 'Away')

df1['new_col1'] = np.where(mask1, s, 0)
df1['new_col2'] = np.where(mask2, s, 0)

answered Jun 14, 2018 at 11:52

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

jezrael Over a year ago

Nice solution :)

A.Abs Over a year ago

@jpp Thank you for taking your time to look at my problem. your solution looks so simple and elegant but I get the following error. TypeError: unsupported operand type(s) for /: 'str' and 'int'

jpp Over a year ago

@A.Abs, Convert relevant series to numeric, e.g. df1['EventCodeId'] = pd.to_numeric(df1['EventCodeId'], errors='coerce')

A.Abs Over a year ago

@jpp, for each match, either based on MatchId or xG_Team1 vs xG_Team2 (row connected), how can I create a list of Home_Goal from new_col1 and Away_Goal from new_col2?

jpp Over a year ago

@A.Abs, I suggest you ask as a separate question.

|

Collectives™ on Stack Overflow

Create a new pandas columns from multiple columns

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related