4

I have a two columned data set that I would like to reshape.
Looking at this fake df:

df=pd.DataFrame([
    ['Alex', 'Apple'],['Bob', 'Banana'],['Clark', 'Citrus'], ['Diana', 'Banana'], [
'Elisa', 'Apple'], ['Frida', 'Citrus'], ['George', 'Citrus'], ['Hanna', 'Banana']
],columns=['Name', 'Fruit'])

I would like to have four columns; Name, Apple, Banana and Citrus where the three latter are booleans (true/false).
I've looked inte unstack but it's really not what I am looking for.

3 Answers 3

5

I think this should be a good use case for get_dummies:

df.set_index('Name')['Fruit'].str.get_dummies().astype(bool).reset_index()

     Name  Apple  Banana  Citrus
0    Alex   True   False   False
1     Bob  False    True   False
2   Clark  False   False    True
3   Diana  False    True   False
4   Elisa   True   False   False
5   Frida  False   False    True
6  George  False   False    True
7   Hanna  False    True   False

In similar vein, we have,

pd.concat([df['Name'], df['Fruit'].str.get_dummies().astype(bool)], axis=1)

     Name  Apple  Banana  Citrus
0    Alex   True   False   False
1     Bob  False    True   False
2   Clark  False   False    True
3   Diana  False    True   False
4   Elisa   True   False   False
5   Frida  False   False    True
6  George  False   False    True
7   Hanna  False    True   False
Sign up to request clarification or add additional context in comments.

6 Comments

Great! Thank you - still new to python (I'm a r-girl). You don't happen to know how to create a matrix from the new df where True/false is 1/0?
@Mactilda Just remove the astype(bool) from my code everywhere. I assumed you wanted True/False since you mentioned booleans, but representing the result as 0/1s is more straightforward.
Thanks! I know how to drop the first column is there anyway I can drop the column headers as well to make it a matrix?
@Mactilda Do you want an array or a DataFrame without column names? If the former, you can use anky_91's suggestion. Otherwise, do df.columns = range(len(df.columns))
@Mactilda Alternatively, I have an answer here that explains how to convert a DataFrame to a matrix.
|
4

You can use the below:

df[['Name']].join(pd.get_dummies(df.Fruit).astype(bool))

     Name  Apple  Banana  Citrus
0    Alex   True   False   False
1     Bob  False    True   False
2   Clark  False   False    True
3   Diana  False    True   False
4   Elisa   True   False   False
5   Frida  False   False    True
6  George  False   False    True
7   Hanna  False    True   False

4 Comments

I see we've the same idea... +1
@coldspeed yes. :D +1d yours too
Thank you! You were both very fast :)
@Mactilda no problem. Cheers..!!
4

Seems like crosstab is fine

pd.crosstab(df.Name,df.Fruit).astype(bool).reset_index()
Out[90]: 
Fruit    Name  Apple  Banana  Citrus
0        Alex   True   False   False
1         Bob  False    True   False
2       Clark  False   False    True
3       Diana  False    True   False
4       Elisa   True   False   False
5       Frida  False   False    True
6      George  False   False    True
7       Hanna  False    True   False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.