I need to somehow hash the strings of the dataframe's fields.
I have this df:
cars = ['Tesla', 'Renault', 'Tesla', 'Fiat', 'Audi', 'Tesla', 'Mercedes', 'Mercedes']
included_colors = ['red', 'green', np.nan, np.nan, 'yellow', 'black', np.nan, 'orange']
data = {'Cars': cars, 'Included Colors': included_colors}
df = pd.DataFrame (data, columns = ['Cars', 'Included Colors'])
And it looks like this:
Cars Included Colors
0 Tesla red
1 Renault green
2 Tesla NaN
3 Fiat NaN
4 Audi yellow
5 Tesla black
6 Mercedes NaN
7 Mercedes orange
I am trying to create a dictionary or another form of data structure that would be useful in this case, in this way:
so that I would finally have the cars and all the associated colors matched, like in this example:
Tesla - red, black
Renault - green
Fiat - np.nan
Audi - yellow
Mercedes - orange
I tried this code but I don't know how to continue...:
all_cars = df['Cars'].tolist() # extract all the cars from the df in a list
all_cars = list(dict.fromkeys(all_cars)) # make them unique
dis = {}
for car in all_cars:
mask = (df['Cars'] == car)
dis[df.loc[mask, 'Cars']] = df.loc[mask, 'Included Colors']
It does not have to be a dictionary, it could be anything, just to have all these key -values matched. I just thought that this data structure would fit.
How to make this work? Thanks a lot!!!!