I have a DataFrame with IDs which are members of different groups (each column is a separate group) - full data.
df
ID G_01 G_02 G_03 G_04
0 A_01 1.0 NaN NaN NaN
1 A_02 1.0 NaN NaN NaN
2 A_03 1.0 1.0 NaN NaN
3 A_04 NaN 1.0 NaN NaN
4 A_05 NaN 1.0 NaN NaN
5 A_06 NaN NaN NaN 1.0
6 A_07 NaN NaN 1.0 1.0
7 A_08 NaN NaN 1.0 NaN
8 A_09 NaN NaN 1.0 NaN
9 A_10 NaN NaN 1.0 NaN
10 A_11 NaN NaN 1.0 1.0
11 A_12 NaN NaN NaN 1.0
As you can see, some IDs are members of more than 1 Group. Therefore G_01 and G_02 should be grouped as one cluster (they share A_03) and G_03 and G_04 share A_07 and A_11 but don't share any ID with G_01 and G_02 therefore should be grouped as cluster 2, like below:
ID G_01 G_02 G_03 G_04 Cluster
0 A_01 1.0 NaN NaN NaN 1
1 A_02 1.0 NaN NaN NaN 1
2 A_03 1.0 1.0 NaN NaN 1
3 A_04 NaN 1.0 NaN NaN 1
4 A_05 NaN 1.0 NaN NaN 1
5 A_06 NaN NaN NaN 1.0 2
6 A_07 NaN NaN 1.0 1.0 2
7 A_08 NaN NaN 1.0 NaN 2
8 A_09 NaN NaN 1.0 NaN 2
9 A_10 NaN NaN 1.0 NaN 2
10 A_11 NaN NaN 1.0 1.0 2
11 A_12 NaN NaN NaN 1.0 2
The number of IDs and Groups isn't constant and I don't know it in advance. Do you have any idea how to achieve this clustering?
EDIT
Order of the columns should not matter. If I change it to G_02, G_03, G_01, G_04 I'd like to receive the same result as with G_01, G_02, G_03, G_04.