1

I've a csv file like this:

Fruit_Type;Fruit_Color;Fruit_Description
Apple;Green,Red,Yellow;Just an apple
Banana;Green,Yellow;Just a Banana
Orange;Red,Yellow;Just an Orange
Grape;;Just a Grape

( Note: There're commas inside of a cell and the colors type number is variable with a maximum of three different colors )

My desired result is:

Fruit_Type;Fruit_Color;Fruit_Description

Apple;Green;0;0;Just an apple
Apple;0;Red;0;Just an apple
Apple;0;0;Yellow;Just an apple
Banana;Green;0;0;Just a Banana
Banana;0;Red;0;Just a Banana
Banana;0;0;Yellow;Just a Banana
Orange;Green;0;0;Just an Orange
Orange;0;Red;0;Just an Orange
Orange;0;0;Yellow;Just an Orange
Grape;0;0;0;Just a Grape
Grape;0;0;0;Just a Grape
Grape;0;0;0;Just a Grape

I want to split the dataframe Fruit_Color column into 3 columns with a 0 value on those colors what aren't present.

I've tryed to convert the dataframe info dataframes like this to get the lines what contais some string:

test.py

#load the csv data into dataframe
data = pd.read_csv(open('test.py','rb'),delimiter=';',encoding='utf-8')

#detect the rows where're the color
Green = data.loc[data['Fruit_Color'].str.contains('Green', case=True)]
Red = data.loc[data['Fruit_Color'].str.contains('Red', case=True)]
Yellow = data.loc[data['Fruit_Color'].str.contains('Yellow', case=True)]

With that i've the rows what contains specific color but i dont know how i can make the joined dataframe with those dataframes and also how can i know those rows what doesn't have any color like the Grape ?

Thanks in advance.

2 Answers 2

1

I suggest use str.get_dummies:

df = df.join(df.pop('Fruit_Color').str.get_dummies(','))
print (df)
  Fruit_Type Fruit_Description  Green  Red  Yellow
0      Apple     Just an apple      1    1       1
1     Banana     Just a Banana      1    0       1
2     Orange    Just an Orange      0    1       1
3      Grape      Just a Grape      0    0       0
Sign up to request clarification or add additional context in comments.

1 Comment

@EliasCortAguelo - it is better as numeric with strings values. You are welcome!
0

You can create the columns using assign:

df.assign(
   green=lambda d: d['Fruit_color'].str.contains('Green', case=True),
   red=lambda d: d['Fruit_color'].str.contains('Red', case=True),
   yellow=lambda d: d['Fruit_color'].str.contains('Yellow', case=True),
)

This results in a new dataframe with three additional columns of Booleans, namely "green", "red" and "yellow".

To detect a row with no known colour, you can also assign other_color=lambda d: ~(d['green'] | d['red'] | d['yellow']).

Another possibility is to use pandas.concat to concatenate multiple dataframes, but it's less elegant than the above solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.