One-hot encoding in Python for array values in a DataFrame

Question

I am trying to do one-hot encoding for these clustered data frames. (grouped by length). Been trying to use sklearn's encoder but it seems to regard each individual row as one category instead of multiple.

Example input:

 ID                    trace  length
 3              [A, B, C, C]       4
 4           [A, B, C, C, D]       5
 5        [A, B, C, C, D, E]       6
 24             [A, B, C, C]       4
 25          [A, B, C, C, D]       5
 ...                     ...     ...

Expected output :

ID     A  B  C  D  E    length
3      1  1  1  0  0         4
4      1  1  1  1  0         5
5      1  1  1  1  1         6
24     1  1  1  0  0         4
25     1  1  1  1  0         5
.... ..... .. ......

Is trace a list or a string? Can you provide the input as DataFrame? — mozway
– mozway, Commented Mar 8, 2022 at 20:46

mozway · Accepted Answer · 2022-03-08 21:06:10Z

3

IIUC, and if target contains lists, you could do:

(df.drop('trace',1)
   .join(df['trace']
         .apply('|'.join)
         .str.get_dummies()
        )
 )

or for in place modification of df:

df = (df.join(df.pop('trace')
              .apply('|'.join)
              .str.get_dummies())
      )

Or using explode and pivot_table:

(df.explode('trace')
   .assign(x=1)
   .pivot_table(index=['ID', 'length'], columns='trace', values='x', aggfunc='first')
   .fillna(0, downcast='infer')
   .reset_index()
 )

Output:

   ID  length  A  B  C  D  E
0   3       4  1  1  1  0  0
1   4       5  1  1  1  1  0
2   5       6  1  1  1  1  1
3  24       4  1  1  1  0  0
4  25       5  1  1  1  1  0

edited Mar 8, 2022 at 21:06

answered Mar 8, 2022 at 20:56

mozway

267k13 gold badges55 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

One-hot encoding in Python for array values in a DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related