Pandas : Fill null in multiple columns only if another column is null

Question

I have a data frame which has 86 columns. There are columns with prefixes such as name_smt1, name_smt2, ..., status_smt1, status_smt2,..., grade_smt1, grade_smt2,... and so. Other columns are subjects, and there are more than 40 subjects columns with different names. I also have this one column named grade_t which I use as a parameter whether I will fill all the subjects columns those contain with D or no. If the column grade_t in a row is null then all of subjects columns (which is null) will be filled with D. I'm trying to do it like this but it raised error saying ValueError: shape mismatch: value array of shape (7,4) could not be broadcast to indexing result of shape (7,) is there any way to do this without the code below? I have been working on this for 2 days, I have tried looping over it but it resulting in filling all of subjects with D even when the IP_t is not null

merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')

Full code

merged_df['TARGET'] = merged_df['TARGET'].fillna(value='TIDAK LULUS')

list_nama_prefix = [col for col in merged_df.columns if 'NAMA_' in col and not 'NAMA_smt1' in col]

merged_df = merged_df.drop(list_nama_prefix,1)
merged_df = merged_df.rename(columns={
    'NAMA_smt1' : 'NAMA'
})

list_ip = [col for col in merged_df.columns if 'IP_' in col]
smt_sebelumnya_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEBELUMNYA_' in col]
smt_skrg_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEKARANG_' in col]
status_sp_cols = [col for col in merged_df.columns if 'Status SP_' in col]
statuses = smt_sebelumnya_cols+status_sp_cols+smt_skrg_cols
matkul = merged_df.select_dtypes(include=['object']).drop(statuses,1).columns.tolist()
list_matkul = [i for i in matkul if i not in ('NIM', 'NAMA','TARGET')]

merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')

My data if you'd like to see

Dom · Accepted Answer · 2021-06-01 05:18:26Z

3

So if I understand your problem correctly, given input

   grade_t subject_1 subject_2  other
0      NaN        A+       NaN    1.0
1      1.0        B-         B    NaN
2      NaN       NaN       NaN    NaN
3      1.0       NaN         A    4.0

You want the output to be

   grade_t subject_1 subject_2  other
0      NaN         D         D    1.0
1      1.0        B-         B    NaN
2      NaN         D         D    NaN
3      1.0       NaN         A    4.0

If so I think this is done most easily using the mask method:

mask = data['grade_t'].isna()
subject_columns = ['subject_1', 'subject_2']
data[subject_columns] = data[subject_columns].mask(mask, other='D', axis=0)

df.mask(cond, other) takes a boolean mask cond and value other, and replaces the values of df with other wherever cond is True, and retains the original value from df everywhere else.

In general, cond can be a DataFrame of the same shape as data, or a Series whose index matches the index or columns of df, in which case you should specify the axis argument (as in the snippet above).

answered Jun 1, 2021 at 5:18

Dom

3001 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user15653864 Over a year ago

hello, thanks for helping me also explaining the code clearly

Collectives™ on Stack Overflow

Pandas : Fill null in multiple columns only if another column is null

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related