5

I am trying to create a new variable that is conditional based on values from several other values. I'm writing here because I've tried writing this as a nested ifelse() statement in R, but it had too many nested ifelse's so it threw an error, and I think there should be an easier way to sort this out in Python.

I have a dataframe (called df) that looks roughly like this (although in reality it's much bigger with many more month/year variables) that I've read in as a pandas DataFrame:

   ID  Sept_2015  Oct_2015  Nov_2015  Dec_2015  Jan_2016  Feb_2016  Mar_2016  \
0   1          0         0         0         0         1         1         1   
1   2          0         0         0         0         0         0         0   
2   3          0         0         0         0         1         1         1   
3   4          0         0         0         0         0         0         0   
4   5          1         1         1         1         1         1         1   

   grad_time  
0        240  
1        218  
2        236  
3          0  
4        206 

I'm trying to create a new variable that depends on values from all these variables, but values from "earlier" variables need to have precedent, so the if/elif/else condition would like something like this:

if df['Sept_2015'] > 0 & df['grad_time'] <= 236:
    return 236
elif df['Oct_2015'] > 0 & df['grad_time'] <= 237:
    return 237
elif df['Nov_2015'] > 0 & df['grad_time'] <= 238:
    return 238
elif df['Dec_2015'] > 0 & df['grad_time'] <= 239:
    return 239
elif df['Jan_2016'] > 0 & df['grad_time'] <= 240:
    return 240
elif df['Feb_2016'] > 0 & df['grad_time'] <= 241:
    return 241
elif df['Mar_2016'] > 0 & df['grad_time'] <= 242:
    return 242
else:
    return 0

And based on this, I'd like it to return a new variable that looks like this:

   trisk
0    240
1      0
2    240
3      0
4    236

I've tried writing a function like this:

def test_func(df):
    """ Test Function for generating new value"""
    if df['Sept_2015'] > 0 & df['grad_time'] <= 236:
        return 236
    elif df['Oct_2015'] > 0 & df['grad_time'] <= 237:
        return 237
    ...
    else:
        return 0

and mapping it to the dataframe to create new variable like this:

new_df = pd.DataFrame(map(test_func, df)) 

However, when I run it, I get the following TypeError

 Traceback (most recent call last):

  File "<ipython-input-83-19b45bcda45a>", line 1, in <module>
     new_df = pd.DataFrame(map(new_func, test_df))

  File "<ipython-input-82-a2eb6f9d7a3a>", line 3, in new_func
     if df['Sept_2015'] > 0 & df['grad_time'] <= 236:

TypeError: string indices must be integers, not str

So I can see it's not wanting the column name here. But I've tried this a number of other ways and can't get it to work. Also, I understand this might not be the best way to write this (mapping the function) so I am open to new ways to attempt to solve the problem of generating the trisk variable. Thanks in advance and apologies if I haven't provided something.

2 Answers 2

3

Without getting into streamlining your logic (which @piRSquared gets into): you can apply your test_func to the rows by issuing .apply(test_func, axis=1) to your dataframe.

import io
import pandas as pd

data = io.StringIO('''\
   ID  Sept_2015  Oct_2015  Nov_2015  Dec_2015  Jan_2016  Feb_2016  Mar_2016  grad_time  
0   1          0         0         0         0         1         1         1        240
1   2          0         0         0         0         0         0         0        218   
2   3          0         0         0         0         1         1         1        236
3   4          0         0         0         0         0         0         0          0
4   5          1         1         1         1         1         1         1        206
''')
df = pd.read_csv(data, delim_whitespace=True)

def test_func(df):
    """ Test Function for generating new value"""
    if df['Sept_2015'] > 0 & df['grad_time'] <= 236:
        return 236
    elif df['Oct_2015'] > 0 & df['grad_time'] <= 237:
        return 237
    elif df['Nov_2015'] > 0 & df['grad_time'] <= 238:
        return 238
    elif df['Dec_2015'] > 0 & df['grad_time'] <= 239:
        return 239
    elif df['Jan_2016'] > 0 & df['grad_time'] <= 240:
        return 240
    elif df['Feb_2016'] > 0 & df['grad_time'] <= 241:
        return 241
    elif df['Mar_2016'] > 0 & df['grad_time'] <= 242:
        return 242
    else:
        return 0

trisk = df.apply(test_func, axis=1)
trick.name = 'trisk'
print(trisk)

Output:

0    240
1      0
2    240
3      0
4    236
Name: trisk, dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

2

Setup

df = pd.DataFrame([[0, 0, 0, 0, 1, 1, 1, 240],
                   [0, 0, 0, 0, 0, 0, 0, 218],
                   [0, 0, 0, 0, 1, 1, 1, 236],
                   [0, 0, 0, 0, 0, 0, 0,   0],
                   [1, 1, 1, 1, 1, 1, 1, 206]],
                  pd.Index(range(1, 6), name='ID'),
                  ['Sept_2015', 'Oct_2015', 'Nov_2015', 'Dec_2015',
                   'Jan_2016', 'Feb_2016', 'Mar_2016', 'grad_time'])

I used mostly numpy for this

a = np.array([236, 237, 238, 239, 240, 241, 242])
b = df.values[:, :-1]
g = df.values[:, -1][:, None] <= a

a[(b & g).argmax(1)] * (b & g).any(1)

Assigning it to new column

df['trisk'] = a[(b != 0).argmax(1)] * (b != 0).any(1)

df

enter image description here

6 Comments

I don't think you want [dates] at the end of your setup.
@AlbertoGarcia-Raboso that's what ensures I have the correct order to the columns and ensures I am giving "earlier" dates precedence. If I messed it up, I'm happy to change it. But I think that's right.
@AlbertoGarcia-Raboso Agreed, I didn't spend a lot of time on it. But I'll take your advice and fix it up.
You do get the OP's desired output, but there's something wrong with the logic: you don't use the OP's grad_time column!
Nicely done! One small improvement: you don't need z --- you can do a[(b & g).argmax(1)] * (b & g).any(1).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.