I have the following data
attr1_A attr1_B attr1_C attr1_D attr2_A attr2_B attr2_C
1 0 0 1 1 0 0
0 1 1 0 0 0 1
0 0 0 0 0 1 0
1 1 1 0 1 1 0
I want to retain attr1_A, attr1_B and combine attr1_C and attr1_D into attr1_others. As long as attr1_C and/or attr1_D is 1, then attr1_others will be 1. Similarly, I want to keep attr2_A but combine the remaining attr2_* into attr2_others. Like this:
attr1_A attr1_B attr1_others attr2_A attr2_others
1 0 1 1 0
0 1 1 0 1
0 0 0 0 1
1 1 1 1 1
In other words, for any group of attr, I want to retain a few known columns but combine the remaining (which I don't know how many remaining attr of the same group.
I am thinking of doing each group separately: processing all attr1_*, and then attr2_* because there are a limited number of groups in my dataset, but many attr under each group.
What I can think right now is to retrieve the others columns like:
# for group 1
df[x for x in df.columns if "A" not in x and "B" not in x and "attr1_" in x]
# for group 2
df[x for x in df.columns if "A" not in x and "attr2_" in x]
And to combine, I am thinking of using any function, but I can't come up with the syntax. Could you help?
Updated attempt:
I tried this
# for group 1
df['attr1_others'] = df[df[[x for x in list(df.columns)
if "attr1_" in x
and "A" not in x
and "B" not in x]].any(axis = 'column')]
but got the below error:
ValueError: No axis named column for object type
<class 'pandas.core.frame.DataFrame'>