0

I am looking to create new columns in a pandas dataframe based on other column value using apply. I receive this error and I don't understand why:

File "C:\dev\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2448, in _setitem_array
    raise ValueError('Columns must be same length as key')
ValueError: Columns must be same length as key

Am I misunderstanding the apply function? Can you update/create multiple columns using a single apply call?

Here is my sample data:

import pandas as pd

x = pd.DataFrame({'VP': ['Brian', 'Sarah', 'Sarah', 'Brian', 'Sarah'],
                  'Director': ['Jim', 'Ian', 'Ian', 'Jim', 'Jerry'],
                  'Requester': ['Kelly', 'Dave', 'Jordan', 'Matt', 'Rob'],
                  'VP from Query': ['Jordan', 'Justin', 'Sarah', 'Brian', 'Sarah'],
                  'Director from Query': ['Other', 'Other', 'Ian', 'Jim', 'Jerry'],
                  'Requester from Query': ['Kelly', 'Dave', 'Jordan', 'Matt', 'Rob']
                  })
x = x[['VP', 'Director', 'Requester', 'VP from Query', 'Director from Query', 'Requester from Query']]


def set_suggested_hierarchy(row):
    if row['VP'] != row['VP from Query']:
        return row[['VP', 'Director']]
    else:
        return row[['VP from Query', 'Director from Query']]


x[['Suggested VP', 'Suggested Director']] = x.apply(lambda row: set_suggested_hierarchy(row), axis=1)

Thank you so much

3 Answers 3

1

I found the answer here: https://datascience.stackexchange.com/questions/29115/pandas-apply-return-must-have-equal-len-keys-and-value-when-setting-with-an-ite

Basically, I needed to change the lambda function to return a series:

def set_suggested_hierarchy(row):
    if row['VP'] != row['VP from Query']:
        return pd.Series([row['VP'], row['Director']])
    else:
        return pd.Series([row['VP from Query'], row['Director from Query']])
Sign up to request clarification or add additional context in comments.

Comments

1

One solution would be to return the entire row of the dataframe, since you are applying this function to the full dataframe:

def set_suggested_hierarchy(row):

    if row['VP'] != row['VP from Query']:
        row['Suggested VP'] = row['VP']
        row['Suggested Director'] = row['Director']
    else:
        row['Suggested VP'] = row['VP from Query']
        row['Suggested Director'] = row['Director from Query']

    return row

x = x.apply(lambda row: set_suggested_hierarchy(row), axis=1)

Comments

0

I think you should get rid of the apply(axis=1) all together. It seems like your logic can be implemented as:

import numpy as np

x['Suggested VP'] = x.VP
x['Suggested Director'] = np.where(x.VP != x['VP from Query'], 
                                   x.Director, x['Director from Query'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.