Apply Lambda Function to Multiple Columns

Question

I'm looking to include multiple columns in my lambda function, but am running into key issues which shouldn't be right. I am looking for this line to create a new column that says IF "Decision" is present within the Task, then flag it as a Decision. Otherwise, IF "Milestone" is present in "Projects", mark it as a Milestone. Otherwise, leave it as the current Task Type.

today['New_Type'] = today[['Task','Projects','Type'].apply(lambda x,y,z: "Decision" if "Decision" in x else "Milestone" if "Milestone" in y else z)

Any ideas how to adjust this?

Can you provide some sample (made up or otherwise) data with your example that illustrates the problem? — Grismar
– Grismar, Commented Dec 17, 2021 at 0:08
What's currently happening, and how is it different than what you intend? — CrazyChucky
– CrazyChucky, Commented Dec 17, 2021 at 0:25
Maybe I am missing something, but what you described does not require a new column. In fact, a new column seems to not benefit you at all, since all the rows of your DataFrame would be assigned with a similar flag to be determined based on the availability of "Decision" in other columns. That would be helpful if you could elaborate. — learner
– learner, Commented Dec 17, 2021 at 1:23
apply always send one value - single value for single column (Series), single row for more columns - so you have to use lambda row: and later row['Task'], etc. — furas
– furas, Commented Dec 17, 2021 at 1:53

Arne · Accepted Answer · 2021-12-17 01:45:43Z

This is easier to debug if you use a regular, named function. Be sure to specify the axis argument when you call apply. The function you write will need to take a single argument that is a tuple of the three column values, so best unpack them immediately for readability:

import pandas as pd

def task_type(row):
    task, project, old_type = row
    if 'decision' in task.lower():
        return 'Decision'
    if 'milestone' in project.lower():
        return 'Milestone'
    return old_type


today = pd.DataFrame({'Task': ['Make a decision.', 
                               'Do something else.',
                               'Write a function.'],
                      'Projects': ['alpha', 'Milestone 7',
                                   'gamma'],
                      'Type': ['old 1', 'old 2', 'old 3']})

today['New_Type'] = today.apply(task_type, axis=1)
today

    Task                Projects     Type   New_Type
0   Make a decision.    alpha        old 1  Decision
1   Do something else.  Milestone 7  old 2  Milestone
2   Write a function.   gamma        old 3  old 3

Parfait · Accepted Answer · 2021-12-17 02:06:14Z

0

Avoid Series.apply (hidden loop) and consider a vectorized, conditional logic approach using numpy.where or numpy.select:

today['New_Type'] = np.where(
    today['Task'].str.contains('Decision', regex = False),
    'Decision',
    np.where(
        today['Task'].str.contains('Milestone', regex = False),
        'Milestone',
        today['Task']
    )
)

today['New_Type'] = np.select(
    condlist = [
        today['Task'].str.contains('Decision', regex = False), 
        today['Task'].str.contains('Milestone', regex = False)
    ],
    choicelist = ['Decision', 'Milestone'],
    default = today['Task']
)

answered Dec 17, 2021 at 2:06

Parfait

108k19 gold badges103 silver badges138 bronze badges

Collectives™ on Stack Overflow

Apply Lambda Function to Multiple Columns

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related