1

I have a very large dataframe where I only want to change the values in a small continuous subset of columns. Basically, in those columns the values are either integers or null. All I want is to replace the 0's and nulls with 'No' and everything else with 'Yes' only in those columns

In R, this can be done basically with a one liner:

df <- df %>%
mutate_at(vars(MCI:BNP), ~factor(case_when(. > 0 ~ 'Yes',
                                           TRUE ~ 'No')))

But we're working in Python and I can't quite figure out the equivalent using Pandas. I've been messing around with loc and iloc, which work fine when only changing a single column but I must be missing something when it comes to modifying multiple columns. And the answers I've found on other stackoverflow answers have all mostly been just changing the value in a single column based on some set of conditions

col1 = df.columns.get_loc("MCI")
col2 = df.columns.get_loc("BNP")
df.iloc[:,col1:col2]

Will get me the columns I want, but trying to call loc doesn't work with multidimensional keys. I even tried it with the columns as a list instead of by integer index by creating a an extra variable

binary_var = ['MCI','PVD','CVA','DEMENTIA','CPD','RD','PUD','MLD','DWOC','DWC','HoP','RND','MALIGNANCY','SLD','MST','HIV','AKF',
          'ARMD','ASPHY','DEP','DWLK','DRUGA','DUOULC','FALL','FECAL','FLDELEX','FRAIL','GASTRICULC','GASTROULC','GLAU','HYPERKAL',
          'HYPTEN','HYPOKAL','HYPOTHYR','HYPOXE','IMMUNOS','ISCHRT','LIPIDMETA','LOSWIGT','LOWBAK','MALNUT','OSTEO','PARKIN',
          'PNEUM','RF','SEIZ','SD','TUML','UI','VI','MENTAL','FUROSEMIDE','METOPROLOL','ASPIRIN','OMEPRAZOLE','LISINOPRIL','DIGOXIN',
          'ALDOSTERONE_ANTAGONIST','ACE_INHIBITOR','ANGIOTENSIN_RECEPTOR_BLOCKERS','BETA_BLOCKERSDIURETICHoP','BUN','CREATININE',
          'SODIUM','POTASSIUM','HEMOGLOBIN','WBC_COUNT','CHLORIDE','ALBUMIN','TROPONIN','BNP']
df.loc[df[binary_var] == 0, binary_var]

But then it just can't find the index for those column names at all. I think Pandas also has problems converting columns that were originally integers into No/Yes. I don't need to do this in place, I'm probably just missing something simple that pandas has built in hopefully

In a very psuedo-code description, all I really want is this

if(df.iloc[:,col1:col2] == 0 || df.iloc[:,col1:col2].isnull())
    df ONLY in that subset of column = 'No'
else
    df ONLY in that subset of column = 'Yes'

1 Answer 1

1

Use:

df.loc[:, 'MCI':'BNP'] = np.where(df.loc[:, 'MCI':'BNP'] > 0, 'Yes', 'No')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.