I am working on a rather large, binary (n-hot encoded) dataframe, which structure is similar to this toy example:
import pandas as pd
data = {
'A' : [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
'B' : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'C' : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
'D' : [1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0],
'E' : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
}
df = pd.DataFrame.from_dict(data)
df
A B C D E
0 0 0 0 1 0
1 0 0 0 1 0
2 0 0 0 1 0
3 1 0 0 0 0
4 1 0 0 0 0
5 1 0 0 0 0
6 1 0 0 0 0
7 1 0 0 0 0
8 0 0 0 1 0
9 0 0 0 1 0
10 0 0 1 0 0
11 0 0 1 0 0
12 0 0 1 0 0
13 0 0 1 0 0
To make some extractions and searches more efficient, I would like to generate some additional indexes, apart from the generic one, like this:
A B C D E
0 1 D 0 0 0 1 0
1 1 D 0 0 0 1 0
2 1 D 0 0 0 1 0
3 1 A 1 0 0 0 0
4 1 A 1 0 0 0 0
5 1 A 1 0 0 0 0
6 1 A 1 0 0 0 0
7 1 A 1 0 0 0 0
8 2 D 0 0 0 1 0
9 2 D 0 0 0 1 0
10 1 C 0 0 1 0 0
11 1 C 0 0 1 0 0
12 1 C 0 0 1 0 0
13 1 C 0 0 1 0 0
Two new columns show which column contains 1 and what appearance that is. Even better would be to have yet another one:
A B C D E
0 1 1 D 0 0 0 1 0
1 1 2 D 0 0 0 1 0
2 1 3 D 0 0 0 1 0
3 1 1 A 1 0 0 0 0
4 1 2 A 1 0 0 0 0
5 1 3 A 1 0 0 0 0
6 1 4 A 1 0 0 0 0
7 1 5 A 1 0 0 0 0
8 2 1 D 0 0 0 1 0
9 2 2 D 0 0 0 1 0
10 1 1 C 0 0 1 0 0
11 1 2 C 0 0 1 0 0
12 1 3 C 0 0 1 0 0
13 1 4 C 0 0 1 0 0
What would be the most efficient way to generate these indexes?