0

I'm looking to include multiple columns in my lambda function, but am running into key issues which shouldn't be right. I am looking for this line to create a new column that says IF "Decision" is present within the Task, then flag it as a Decision. Otherwise, IF "Milestone" is present in "Projects", mark it as a Milestone. Otherwise, leave it as the current Task Type.

today['New_Type'] = today[['Task','Projects','Type'].apply(lambda x,y,z: "Decision" if "Decision" in x else "Milestone" if "Milestone" in y else z)

Any ideas how to adjust this?

4
  • 2
    Can you provide some sample (made up or otherwise) data with your example that illustrates the problem? Commented Dec 17, 2021 at 0:08
  • What's currently happening, and how is it different than what you intend? Commented Dec 17, 2021 at 0:25
  • Maybe I am missing something, but what you described does not require a new column. In fact, a new column seems to not benefit you at all, since all the rows of your DataFrame would be assigned with a similar flag to be determined based on the availability of "Decision" in other columns. That would be helpful if you could elaborate. Commented Dec 17, 2021 at 1:23
  • 1
    apply always send one value - single value for single column (Series), single row for more columns - so you have to use lambda row: and later row['Task'], etc. Commented Dec 17, 2021 at 1:53

2 Answers 2

1

This is easier to debug if you use a regular, named function. Be sure to specify the axis argument when you call apply. The function you write will need to take a single argument that is a tuple of the three column values, so best unpack them immediately for readability:

import pandas as pd

def task_type(row):
    task, project, old_type = row
    if 'decision' in task.lower():
        return 'Decision'
    if 'milestone' in project.lower():
        return 'Milestone'
    return old_type


today = pd.DataFrame({'Task': ['Make a decision.', 
                               'Do something else.',
                               'Write a function.'],
                      'Projects': ['alpha', 'Milestone 7',
                                   'gamma'],
                      'Type': ['old 1', 'old 2', 'old 3']})

today['New_Type'] = today.apply(task_type, axis=1)
today
    Task                Projects     Type   New_Type
0   Make a decision.    alpha        old 1  Decision
1   Do something else.  Milestone 7  old 2  Milestone
2   Write a function.   gamma        old 3  old 3
Sign up to request clarification or add additional context in comments.

Comments

0

Avoid Series.apply (hidden loop) and consider a vectorized, conditional logic approach using numpy.where or numpy.select:

today['New_Type'] = np.where(
    today['Task'].str.contains('Decision', regex = False),
    'Decision',
    np.where(
        today['Task'].str.contains('Milestone', regex = False),
        'Milestone',
        today['Task']
    )
)

today['New_Type'] = np.select(
    condlist = [
        today['Task'].str.contains('Decision', regex = False), 
        today['Task'].str.contains('Milestone', regex = False)
    ],
    choicelist = ['Decision', 'Milestone'],
    default = today['Task']
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.