I am trying to create a new variable that is conditional based on values from several other values. I'm writing here because I've tried writing this as a nested ifelse() statement in R, but it had too many nested ifelse's so it threw an error, and I think there should be an easier way to sort this out in Python.
I have a dataframe (called df) that looks roughly like this (although in reality it's much bigger with many more month/year variables) that I've read in as a pandas DataFrame:
ID Sept_2015 Oct_2015 Nov_2015 Dec_2015 Jan_2016 Feb_2016 Mar_2016 \
0 1 0 0 0 0 1 1 1
1 2 0 0 0 0 0 0 0
2 3 0 0 0 0 1 1 1
3 4 0 0 0 0 0 0 0
4 5 1 1 1 1 1 1 1
grad_time
0 240
1 218
2 236
3 0
4 206
I'm trying to create a new variable that depends on values from all these variables, but values from "earlier" variables need to have precedent, so the if/elif/else condition would like something like this:
if df['Sept_2015'] > 0 & df['grad_time'] <= 236:
return 236
elif df['Oct_2015'] > 0 & df['grad_time'] <= 237:
return 237
elif df['Nov_2015'] > 0 & df['grad_time'] <= 238:
return 238
elif df['Dec_2015'] > 0 & df['grad_time'] <= 239:
return 239
elif df['Jan_2016'] > 0 & df['grad_time'] <= 240:
return 240
elif df['Feb_2016'] > 0 & df['grad_time'] <= 241:
return 241
elif df['Mar_2016'] > 0 & df['grad_time'] <= 242:
return 242
else:
return 0
And based on this, I'd like it to return a new variable that looks like this:
trisk
0 240
1 0
2 240
3 0
4 236
I've tried writing a function like this:
def test_func(df):
""" Test Function for generating new value"""
if df['Sept_2015'] > 0 & df['grad_time'] <= 236:
return 236
elif df['Oct_2015'] > 0 & df['grad_time'] <= 237:
return 237
...
else:
return 0
and mapping it to the dataframe to create new variable like this:
new_df = pd.DataFrame(map(test_func, df))
However, when I run it, I get the following TypeError
Traceback (most recent call last):
File "<ipython-input-83-19b45bcda45a>", line 1, in <module>
new_df = pd.DataFrame(map(new_func, test_df))
File "<ipython-input-82-a2eb6f9d7a3a>", line 3, in new_func
if df['Sept_2015'] > 0 & df['grad_time'] <= 236:
TypeError: string indices must be integers, not str
So I can see it's not wanting the column name here. But I've tried this a number of other ways and can't get it to work. Also, I understand this might not be the best way to write this (mapping the function) so I am open to new ways to attempt to solve the problem of generating the trisk variable. Thanks in advance and apologies if I haven't provided something.
