0

Can anyone tell me why it doesn't work and how to fix it ?

I'm trying to use a lambda function to choose the value of a column based on a condition on another column.

df = pd.DataFrame({'A': [4, 8, 2, 7, 4],
                   'B': [8, 10, 3, 4, 1],
                   'C': [10, 8, 2, 6, 2]})

df
`df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'])`
KeyError                                  Traceback (most recent call last)
c:\xxxxxx\xxxxxx\xxx Cellule 19 in <cell line: 1>()
----> 1 df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'])

File c:\Anaconda\envs\xxxxx\xxxx.py:8839, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
   8828 from pandas.core.apply import frame_apply
   8830 op = frame_apply(
   8831     self,
   8832     func=func,
   (...)
   8837     kwargs=kwargs,
   8838 )
-> 8839 return op.apply().__finalize__(self, method="apply")

File c:\Anaconda\xxxxxlib\site-packages\pandas\core\apply.py:727, in FrameApply.apply(self)
    724 elif self.raw:
    725     return self.apply_raw()
--> 727 return self.apply_standard()

File c:\Anaconda\envs\xxxx\pandas\core\apply.py:851, in FrameApply.apply_standard(self)
    850 def apply_standard(self):
--> 851     results, res_index = self.apply_series_generator()
    853     # wrap results
    854     return self.wrap_results(results, res_index)
...
    388     self._check_indexing_error(key)
--> 389     raise KeyError(key)
    390 return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: 'A'

2 Answers 2

1

you need to specify the axis=1 attribute. refer to dataframe.apply

df.apply(lambda x: x['B'] if x['A'].isin([1,2,3,4,5]) else x['C'], axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, it works df.apply(lambda x: x['B'] if x['A'] in [1,2,3,4,5] else x['C'], axis=1)
1

Do not use apply, this is a waste of pandas' vectorial capabilities.

Use instead:

df['new'] = df['B'].where(df['A'].isin([1,2,3,4,5]), df['C'])

# or
df['new'] = df['B'].where(df['A'].between(1, 5, inclusive='both'), df['C'])

Or with :

import numpy as np
df['new'] = np.where(df['A'].isin([1,2,3,4,5]), df['B'], df['C'])

output:

   A   B   C  new
0  4   8  10    8
1  8  10   8    8
2  2   3   2    3
3  7   4   6    6
4  4   1   2    1

1 Comment

I like this method avoiding apply function. (asked in another post you replied before), if the new column is trying to return the 'column name' of each row with the second largest value (i.e. similar topic here with returning value depending on mutiple column), is that possible using this where function?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.