Create pandas dataframe with multiple dataframes

Question

I've a csv file like this:

Fruit_Type;Fruit_Color;Fruit_Description
Apple;Green,Red,Yellow;Just an apple
Banana;Green,Yellow;Just a Banana
Orange;Red,Yellow;Just an Orange
Grape;;Just a Grape

( Note: There're commas inside of a cell and the colors type number is variable with a maximum of three different colors )

My desired result is:

Fruit_Type;Fruit_Color;Fruit_Description

Apple;Green;0;0;Just an apple
Apple;0;Red;0;Just an apple
Apple;0;0;Yellow;Just an apple
Banana;Green;0;0;Just a Banana
Banana;0;Red;0;Just a Banana
Banana;0;0;Yellow;Just a Banana
Orange;Green;0;0;Just an Orange
Orange;0;Red;0;Just an Orange
Orange;0;0;Yellow;Just an Orange
Grape;0;0;0;Just a Grape
Grape;0;0;0;Just a Grape
Grape;0;0;0;Just a Grape

I want to split the dataframe Fruit_Color column into 3 columns with a 0 value on those colors what aren't present.

I've tryed to convert the dataframe info dataframes like this to get the lines what contais some string:

test.py

#load the csv data into dataframe
data = pd.read_csv(open('test.py','rb'),delimiter=';',encoding='utf-8')

#detect the rows where're the color
Green = data.loc[data['Fruit_Color'].str.contains('Green', case=True)]
Red = data.loc[data['Fruit_Color'].str.contains('Red', case=True)]
Yellow = data.loc[data['Fruit_Color'].str.contains('Yellow', case=True)]

With that i've the rows what contains specific color but i dont know how i can make the joined dataframe with those dataframes and also how can i know those rows what doesn't have any color like the Grape ?

Thanks in advance.

jezrael · Accepted Answer · 2018-04-09 09:18:29Z

1

I suggest use str.get_dummies:

df = df.join(df.pop('Fruit_Color').str.get_dummies(','))
print (df)
  Fruit_Type Fruit_Description  Green  Red  Yellow
0      Apple     Just an apple      1    1       1
1     Banana     Just a Banana      1    0       1
2     Orange    Just an Orange      0    1       1
3      Grape      Just a Grape      0    0       0

answered Apr 9, 2018 at 9:18

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

@EliasCortAguelo - it is better as numeric with strings values. You are welcome!

Guybrush · Accepted Answer · 2018-04-09 09:16:43Z

0

You can create the columns using assign:

df.assign(
   green=lambda d: d['Fruit_color'].str.contains('Green', case=True),
   red=lambda d: d['Fruit_color'].str.contains('Red', case=True),
   yellow=lambda d: d['Fruit_color'].str.contains('Yellow', case=True),
)

This results in a new dataframe with three additional columns of Booleans, namely "green", "red" and "yellow".

To detect a row with no known colour, you can also assign other_color=lambda d: ~(d['green'] | d['red'] | d['yellow']).

Another possibility is to use pandas.concat to concatenate multiple dataframes, but it's less elegant than the above solution.

answered Apr 9, 2018 at 9:16

Guybrush

2,8421 gold badge13 silver badges20 bronze badges

Collectives™ on Stack Overflow

Create pandas dataframe with multiple dataframes

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related