0

I have a data frame which has 86 columns. There are columns with prefixes such as name_smt1, name_smt2, ..., status_smt1, status_smt2,..., grade_smt1, grade_smt2,... and so. Other columns are subjects, and there are more than 40 subjects columns with different names. I also have this one column named grade_t which I use as a parameter whether I will fill all the subjects columns those contain with D or no. If the column grade_t in a row is null then all of subjects columns (which is null) will be filled with D. I'm trying to do it like this but it raised error saying ValueError: shape mismatch: value array of shape (7,4) could not be broadcast to indexing result of shape (7,) is there any way to do this without the code below? I have been working on this for 2 days, I have tried looping over it but it resulting in filling all of subjects with D even when the IP_t is not null

merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')

Full code

merged_df['TARGET'] = merged_df['TARGET'].fillna(value='TIDAK LULUS')

list_nama_prefix = [col for col in merged_df.columns if 'NAMA_' in col and not 'NAMA_smt1' in col]

merged_df = merged_df.drop(list_nama_prefix,1)
merged_df = merged_df.rename(columns={
    'NAMA_smt1' : 'NAMA'
})

list_ip = [col for col in merged_df.columns if 'IP_' in col]
smt_sebelumnya_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEBELUMNYA_' in col]
smt_skrg_cols = [col for col in merged_df.columns if 'STATUS LULUS SMT SEKARANG_' in col]
status_sp_cols = [col for col in merged_df.columns if 'Status SP_' in col]
statuses = smt_sebelumnya_cols+status_sp_cols+smt_skrg_cols
matkul = merged_df.select_dtypes(include=['object']).drop(statuses,1).columns.tolist()
list_matkul = [i for i in matkul if i not in ('NIM', 'NAMA','TARGET')]

merged_df.loc[merged_df['IP_t'].isnull(),matkul] = merged_df[merged_df['IP_t'].isnull()][matkul].fillna(value='D')

My data if you'd like to see

1 Answer 1

3

So if I understand your problem correctly, given input

   grade_t subject_1 subject_2  other
0      NaN        A+       NaN    1.0
1      1.0        B-         B    NaN
2      NaN       NaN       NaN    NaN
3      1.0       NaN         A    4.0

You want the output to be

   grade_t subject_1 subject_2  other
0      NaN         D         D    1.0
1      1.0        B-         B    NaN
2      NaN         D         D    NaN
3      1.0       NaN         A    4.0

If so I think this is done most easily using the mask method:

mask = data['grade_t'].isna()
subject_columns = ['subject_1', 'subject_2']
data[subject_columns] = data[subject_columns].mask(mask, other='D', axis=0)

df.mask(cond, other) takes a boolean mask cond and value other, and replaces the values of df with other wherever cond is True, and retains the original value from df everywhere else.

In general, cond can be a DataFrame of the same shape as data, or a Series whose index matches the index or columns of df, in which case you should specify the axis argument (as in the snippet above).

Sign up to request clarification or add additional context in comments.

1 Comment

hello, thanks for helping me also explaining the code clearly

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.