2

I'm trying to add a new column to a DataFrame based on the boolean values in another column.

Given a DataFrame like this:

snr = DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E'],  'seniority': [False, False, False, True, False] })

The furthest I've come so far is this:

def refine_seniority(contact):
    contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'

snr.apply(refine_seniority)

yet I'm getting this error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-208-0694ebf79a50> in <module>()
      2     contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'
      3 
----> 4 snr.apply(refine_seniority)
      5 
      6 snr

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds )
   4414                     return self._apply_raw(f, axis)
   4415                 else:
-> 4416                     return self._apply_standard(f, axis)
   4417             else:
   4418                 return self._apply_broadcast(f, axis)

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4489                     # no k defined yet
   4490                     pass
-> 4491                 raise e
   4492 
   4493 

KeyError: ('seniority', u'occurred at index name')

Feels like I'm missing some fundamental understanding on DataFrames, but I'm stuck.

What's the proper way to add a new column based on boolean values in a different column?

2 Answers 2

6

You can create a dict and call map:

In [176]:

temp = {True:'senior', False:'Non-senior'}
snr['refined_seniority'] = snr['seniority'].map(temp)
snr
Out[176]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

As user @Jeff has pointed out using map or apply should be a last resort if a vectorised solution can be applied.

Or use numpy where

In [178]:

snr['refined_seniority'] = np.where(snr['seniority'] == True, 'senior', 'Non-senior')
snr
Out[178]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

If you modifed your function to this then it would work:

In [187]:

def refine_seniority(contact):
    if contact == True:
        return 'senior'
    else:
        return 'Non-senior'

snr['refined_seniority'] = snr['seniority'].apply(refine_seniority)
snr
Out[187]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

What you wrote is incorrect, you are calling apply on the df but the column as a label does not exist, see below:

In [193]:

def refine_seniority(contact):
    print(contact)


snr['refined_seniority'] = snr.apply(refine_seniority)

0    A
1    B
2    C
3    D
4    E
Name: name, dtype: object
0    False
1    False
2    False
3     True
4    False
Name: seniority, dtype: object

Here you can see that it outputs 2 pandas series, there is no key value for 'seniority' hence the error.

Sign up to request clarification or add additional context in comments.

8 Comments

this is just a simple application of .where
@Jeff how would that look like?
very similar to the np.where soln
@Jeff sorry Jeff I naively tried this snr['refined_seniority'] = snr.where(snr['seniority'] == True, 'senior', 'Non-senior') but it gave an error and then a few other variants and they all failed, the method signature appears different to numpy's where, it appears to want a NDframe as the other param and sometimes an axis arg. What am I missing?
snr['senority'].where(...) ; you are indexing a frame with a series doesn't make sense
|
1
snr['refine_seniority']= snr['seniority'].map({True:'senior', False:'Non-senior'})

2 Comments

While this code may answer the question, it is better to explain how to solve the problem and provide the code as an example or reference. Code-only answers can be confusing and lack context.
Welcome to Stack Overflow! Thank you for the code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by describing why this is a good solution to the problem, and would make it more useful to future readers with other similar questions. Please edit your answer to add some explanation, including the assumptions you've made.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.