2

I need to somehow hash the strings of the dataframe's fields.

I have this df:

cars =            ['Tesla', 'Renault', 'Tesla', 'Fiat', 'Audi', 'Tesla', 'Mercedes', 'Mercedes']
included_colors = ['red', 'green', np.nan, np.nan, 'yellow', 'black', np.nan, 'orange']
data = {'Cars': cars, 'Included Colors': included_colors}
df = pd.DataFrame (data, columns = ['Cars', 'Included Colors'])

And it looks like this:

       Cars Included Colors
0     Tesla             red
1   Renault           green
2     Tesla             NaN
3      Fiat             NaN
4      Audi          yellow
5     Tesla           black
6  Mercedes             NaN
7  Mercedes          orange

I am trying to create a dictionary or another form of data structure that would be useful in this case, in this way:

so that I would finally have the cars and all the associated colors matched, like in this example:

Tesla - red, black
Renault - green
Fiat - np.nan
Audi - yellow
Mercedes - orange

I tried this code but I don't know how to continue...:

all_cars = df['Cars'].tolist() # extract all the cars from the df in a list
all_cars = list(dict.fromkeys(all_cars)) # make them unique

dis = {}
for car in all_cars:
    mask = (df['Cars'] == car)
    dis[df.loc[mask, 'Cars']] = df.loc[mask, 'Included Colors']
    

It does not have to be a dictionary, it could be anything, just to have all these key -values matched. I just thought that this data structure would fit.

How to make this work? Thanks a lot!!!!

1
  • Pandas is not helping you here. It would be easier (and rather straightforward) to create your dictionary from the original lists. Commented Apr 15, 2021 at 20:19

1 Answer 1

2

You can use groupby() and aggregate to list. Then create output dictionary:

x = df.groupby("Cars", as_index=False).agg(list)
out = dict(zip(x.Cars, x["Included Colors"]))
print(out)

Prints:

{'Audi': ['yellow'], 'Fiat': [nan], 'Mercedes': [nan, 'orange'], 'Renault': ['green'], 'Tesla': ['red', nan, 'black']}

Thanks to @QuangHoang a shorter answer:

print(df.groupby("Cars")['Included Colors'].agg(list).to_dict())
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot! This is great! I will accept the answer in 6 min
df.groupby("Cars")['Included Colors'].agg(list).to_dict().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.