1

I have a data frame in CSV containing 5 columns. I want to create a new column based on the conditions in the rows. Like my df is:

col1 col2 col3 col4
1    1    1    1
0    0    1    1
1    1    1    1
nan  nan  nan  nan

Here is my code sample

m1 = df[['col1','col2','col3','col4']].all(axis=1)
m2 = df[['col1','col2','col3','col4']].isna().any(axis=1)
df['STATUS AUTO'] = np.select([m2, m1], ['ZD', 'FIC'],'PARTIALLY IMMUNIZED')

It does not give me "PARTIALLY IMMUNIZED" although there are many. Like in the above sample row1 is FIC, row2 & row3 are "PARTIALLY IMMUNIZED" while row4 is "ZD". It gives me "ZD" for "PARTIALLY IMMUNIZED". Any help, please. PS: (The same code works for another DF a few months back but not for this DF)

1
  • sorry row3 is partially immunized and row1 and row3 are FIC. Commented Jun 9, 2022 at 11:23

1 Answer 1

1

Seems problem with strings instead numbers:

cols = ['col1','col2','col3','col4']

df[cols] = df[cols].astype(float)

m1 = df[cols].eq(1).all(axis=1)
m2 = df[cols].isna().any(axis=1)
df['STATUS AUTO'] = np.select([m2, m1], ['ZD', 'FIC'],'PARTIALLY IMMUNIZED')

print (df)
   col1  col2  col3  col4          STATUS AUTO
0   1.0   1.0   1.0   1.0                  FIC
1   0.0   0.0   1.0   1.0  PARTIALLY IMMUNIZED
2   1.0   1.0   1.0   1.0                  FIC
3   NaN   NaN   NaN   NaN                   ZD
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you @jezrael, you are always available for help. The code is running fine but in my case, my df as many columns (like "Name", "District") which are strings. The last four columns are shown in the question above. I did what you asked me, but still, it gives "ZD" when there should be "Partially Immunized". (I ran your code on sample data, and it works fine).
@MuhammadRehan- is possible show row which failed? what are values in this row?
@MuhammadRehan - You can convert only last 4 clumns to numeric, answer was edited.
Thank you @jezrael. I accept the answer because it is working on other datasets but I don't know why it is still giving me wrong answer. There may be problem with "nan". How can I convert the "blank" into "nan". (I already did this but still problem). Any other suggestion please.
@MuhammadRehan - first question - df[cols] = df[cols].astype(float) working well? Or return error?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.