Based on this post on stack i tried the value counts function like this
df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))
and it works fine apart from the fact that although my data has 22 unique genres and after the split i get 42 values, which of course are not unique. Data example:
Action Adventure Casual Design & Illustration Early Access Education Free to Play Indie Massively Multiplayer Photo Editing RPG Racing Simulation Software Training Sports Strategy Utilities Video Production Web Publishing Accounting Action Adventure Animation & Modeling Audio Production Casual Design & Illustration Early Access Education Free to Play Indie Massively Multiplayer Photo Editing RPG Racing Simulation Software Training Sports Strategy Utilities Video Production Web Publishing nan
0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
(i have pasted the head and the first row only)
I have a feeling that the problem is caused from my original data.Well, my column (genres) was a list of lists which contained brackets
example :[Action,Indie]
so when python was reading it, it would read [Action and Action and Action] as different values and the output was 303 different values.
So what i did is that:
for i in df1['genres'].tolist():
if str(i) != 'nan':
i = i[1:-1]
new.append(i)
else:
new.append('nan')
if str(i).notnull():df1, 5 - 6 rows?print df['genres'].str.get_dummies(sep=',')strip()