4

I have the following dataframe:

+------------+------------------+
| item       | categories       |
+------------+------------------+
| blue_shirt | ['red', 'white'] |
+------------+------------------+
| red_skirt  | ['blue', 'red']  |
+------------+------------------+

and I want to get this instead:

+------------+-----+-------+------+
| item       | red | white | blue |
+------------+-----+-------+------+
| blue_shirt | 1   | 1     | 0    |
+------------+-----+-------+------+
| red_skirt  | 1   | 0     | 1    |
+------------+-----+-------+------+

here is what I tried:

orders = orders.join(pd.get_dummies(orders['Categories'].explode()))

it creates the right columns however it creates (a lot) of additional rows too. I want one row in the end for each item like in the example above.

3 Answers 3

4

You can also solve this problem with a one-liner using .str method of pandas

df['categories'].str.join('|').str.get_dummies()

The format within each cell of 'categories' column needs to be a list. If it's a string of something else, you can make it a list with an .apply function. For example, if content of 'categories' column is a list saved as a string:

df['categories'].apply(lambda x: eval(x)).str.join('|').str.get_dummies()
Sign up to request clarification or add additional context in comments.

Comments

0

You can explode the categories and the pivot the dataframe:

print(
    df.explode("categories")
    .pivot_table(
        index="item", columns="categories", aggfunc="size", fill_value=0
    )
    .reset_index()
)

Prints:

categories        item  blue  red  white
0           blue_shirt     0    1      1
1            red_skirt     1    1      0

5 Comments

this doesn't seem to work, as it reduces the number of rows.. I don't know why? It also makes it so each item has 'red' set to more than 1 but I want binary columns (1 or 0)
@mokiliiiLo Can you post dataframe you're working with? I used the one you have in question.
i can't as it's too big and confidential data.
@mokiliiiLo Just reduce it to a minimal sample that doesn't work with explode/pivot_table
This does not work if you don't specify an index. If you do and there are empty lists in the column, the final index won't match with the original data frame.
0

I was able to get to your desired outcome with the below approach.

Step 1: creating a list of columns that will be used in your resulting dataset.

>>> cols = list(set(df['categories'].explode())) #set makes sure we keep unique columns
>>> cols.insert(0,df.columns[0]) 

cols
Out[359]: ['item', 'red', 'white', 'blue']

OPTIONAL: You can make sure explode() works by converting your column to list type: taken from here

from ast import literal_eval
df['categories'] = df['categories'].apply(literal_eval) # convert to list type

Step 2: use your code with pd.get_dummies(), and create the 1/0 dataset. To avoid getting more rows, I added an additional step using groupby.index:

>>> temp = pd.get_dummies(df['categories'].explode())
>>> temp_res = temp.groupby(temp.index).sum()

Out[365]: 
   blue  red  white
0     0    1      1
1     1    1      0

Step 3:, I concat your 'item' column, with the above result, and used the column list created in the first step to get your desired result:

>>> out = pd.concat([df['item'],temp_res],axis=1,ignore_index=False)
>>> out = out[cols]

Out[368]: 
         item  red  white  blue
0  blue_shirt    1      1     0
1   red_skirt    1      0     1

All the code in 1 block:

from ast import literal_eval
df['categories'] = df['categories'].apply(literal_eval) #convert to list type

cols = list(set(df['categories'].explode()))
cols.insert(0,df.columns[0]) 

temp = pd.get_dummies(df['categories'].explode())
temp_res = temp.groupby(temp.index).sum()

out = pd.concat([df['item'],temp_res],axis=1)
out = out[cols]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.