How to hash the strings of a dataframe in Python?

Question

I need to somehow hash the strings of the dataframe's fields.

I have this df:

cars =            ['Tesla', 'Renault', 'Tesla', 'Fiat', 'Audi', 'Tesla', 'Mercedes', 'Mercedes']
included_colors = ['red', 'green', np.nan, np.nan, 'yellow', 'black', np.nan, 'orange']
data = {'Cars': cars, 'Included Colors': included_colors}
df = pd.DataFrame (data, columns = ['Cars', 'Included Colors'])

And it looks like this:

       Cars Included Colors
0     Tesla             red
1   Renault           green
2     Tesla             NaN
3      Fiat             NaN
4      Audi          yellow
5     Tesla           black
6  Mercedes             NaN
7  Mercedes          orange

I am trying to create a dictionary or another form of data structure that would be useful in this case, in this way:

so that I would finally have the cars and all the associated colors matched, like in this example:

Tesla - red, black
Renault - green
Fiat - np.nan
Audi - yellow
Mercedes - orange

I tried this code but I don't know how to continue...:

all_cars = df['Cars'].tolist() # extract all the cars from the df in a list
all_cars = list(dict.fromkeys(all_cars)) # make them unique

dis = {}
for car in all_cars:
    mask = (df['Cars'] == car)
    dis[df.loc[mask, 'Cars']] = df.loc[mask, 'Included Colors']

It does not have to be a dictionary, it could be anything, just to have all these key -values matched. I just thought that this data structure would fit.

How to make this work? Thanks a lot!!!!

Pandas is not helping you here. It would be easier (and rather straightforward) to create your dictionary from the original lists. — Tim Roberts
– Tim Roberts, Commented Apr 15, 2021 at 20:19

Andrej Kesely · Accepted Answer · 2021-04-15 20:27:19Z

2

You can use groupby() and aggregate to list. Then create output dictionary:

x = df.groupby("Cars", as_index=False).agg(list)
out = dict(zip(x.Cars, x["Included Colors"]))
print(out)

Prints:

{'Audi': ['yellow'], 'Fiat': [nan], 'Mercedes': [nan, 'orange'], 'Renault': ['green'], 'Tesla': ['red', nan, 'black']}

Thanks to @QuangHoang a shorter answer:

print(df.groupby("Cars")['Included Colors'].agg(list).to_dict())

edited Apr 15, 2021 at 20:27

answered Apr 15, 2021 at 20:21

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andrew Tulip Over a year ago

Thanks a lot! This is great! I will accept the answer in 6 min

Quang Hoang Over a year ago

df.groupby("Cars")['Included Colors'].agg(list).to_dict().

Collectives™ on Stack Overflow

How to hash the strings of a dataframe in Python?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related