Need help with Pandas multiple IF-ELSE statements. I have a test dataset (titanic) as follows:
ID Survived Pclass Name Sex Age
1 0 3 Braund male 22
2 1 1 Cumings, Mrs. female 38
3 1 3 Heikkinen, Miss. Laina female 26
4 1 1 Futrelle, Mrs. female 35
5 0 3 Allen, Mr. male 35
6 0 3 Moran, Mr. male
7 0 1 McCarthy, Mr. male 54
8 0 3 Palsson, Master male 2
where Id is the passenger id. I want to create a new flag variable in this data frame which has the following rule:
if Sex=="female" or (Pclass==1 and Age <18) then 1 else 0.
Now to do this I tried a few approaches. This is how I approached first:
df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
df['Prediction']=1
else:
df['Prediction']=0
The problem with above code is that it creates a Prediction variable in df but with all values as 0.
However if I use the same code but instead output it to a dictionary it gives the right answer as shown below:
prediction={}
df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
prediction[passenger['ID']=1
else:
prediction[passenger['ID']=0
This gives a dict prediction with keys as ID and values as 1 or 0 based on the above logic.
So why the df variable works wrongly?. I even tried by first defining a function and then calling it. Gave the same ans as first.
So, how can we do this in pandas?.
Secondly, I guess the same can be done if we can just use some multiple if-else statements. I know np.where but it is not allowing to add 'and' condition. So here is what I was trying:
df['Prediction']=np.where(df['Sex']=="female",1,np.where((df['Pclass']==1 and df['Age']<18),1,0)
The above gave an error for 'and' keyword in where.
So can someone help?. Solutions with multiple approache using np.where(simple if-else like) and using some function(applymap etc) or modifications to what I wrote earlier would be really appreciated.
Also how do we do the same using some applymap or apply/map method of df?.