2

How can I binarize a dataset according to the index? E.g.

                   A          B          C
idUser                                 
3                  1          1          1
2                  0          1          0
4                  1          0          0

I have tried using pd.get_dummies but the result is almost what I need.

dictio = {'idUser': [3, 3, 3, 2, 4], 'artist': ['A', 'B', 'C', 'B', 'A']}
df = pd.DataFrame(dictio)
df = df.set_index('idUser')
df_binary = pd.get_dummies(df, columns=['artist'])
print(df_binary)
                   A          B          C
idUser                                 
3                  1          0          0
3                  0          1          0
3                  0          0          1
2                  0          1          0
4                  1          0          0
2
  • 1
    Which dataframe is your example and which your expected output? Does df_binary.groupby('idUser', sort=False).max() solve your question? Commented Dec 20, 2021 at 14:29
  • It does! Thank you. It also keeps other columns in the dataset (e.g. country and sex) that I needed. Commented Dec 20, 2021 at 14:45

1 Answer 1

1
In [27]: df_binary.groupby(level=0).any().astype(int)
Out[27]:
        artist_A  artist_B  artist_C
idUser
2              0         1         0
3              1         1         1
4              1         0         0

alternatively starting from your df before the .set_index()

In [33]: df.pivot_table(index='idUser', columns='artist', aggfunc='size', fill_value=0).rename_axis(columns=None)
Out[33]:
        A  B  C
idUser
2       0  1  0
3       1  1  1
4       1  0  0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.