Python pandas - value_counts not working properly

Question

Based on this post on stack i tried the value counts function like this

df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))

and it works fine apart from the fact that although my data has 22 unique genres and after the split i get 42 values, which of course are not unique. Data example:

     Action  Adventure   Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG     Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing Accounting  Action  Adventure   Animation & Modeling    Audio Production    Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing  nan
0   nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan

(i have pasted the head and the first row only)

I have a feeling that the problem is caused from my original data.Well, my column (genres) was a list of lists which contained brackets

example :[Action,Indie] so when python was reading it, it would read [Action and Action and Action] as different values and the output was 303 different values. So what i did is that:

for i in df1['genres'].tolist():
if str(i) != 'nan':

    i = i[1:-1]
    new.append(i)
else:
    new.append('nan')

But I think you can use: print df['genres'].str.get_dummies(sep=',') — jezrael
– jezrael, Commented Dec 4, 2015 at 14:02
Ok i have found the problem, but i am not sure how to solve it. My header data, meaning the genres has issues with spaces. Meaning that Action appears as [space]Action , Action , Action(space) — Thodoris P
– Thodoris P, Commented Dec 5, 2015 at 15:57

jezrael · Accepted Answer · 2015-12-05 17:18:45Z

1

You have to remove first and last [] from column genres by function str.strip and then replace spaces by empty string by function str.replace

import pandas as pd

df = pd.read_csv('test/Copy of AppCrawler.csv', sep="\t")


df['genres'] = df['genres'].str.strip('[]')
df['genres'] = df['genres'].str.replace(' ', '')

df = df.join(df.genres.str.split(",").apply(pd.value_counts).fillna(0))

#temporaly display 30 rows and 60 columns
with pd.option_context('display.max_rows', 30, 'display.max_columns', 60):
    print df
    #remove for clarity
print df.columns
Index([u'Unnamed: 0', u'appid', u'currency', u'final_price', u'genres',
       u'initial_price', u'is_free', u'metacritic', u'release_date',
       u'Accounting', u'Action', u'Adventure', u'Animation&Modeling',
       u'AudioProduction', u'Casual', u'Design&Illustration', u'EarlyAccess',
       u'Education', u'FreetoPlay', u'Indie', u'MassivelyMultiplayer',
       u'PhotoEditing', u'RPG', u'Racing', u'Simulation', u'SoftwareTraining',
       u'Sports', u'Strategy', u'Utilities', u'VideoProduction',
       u'WebPublishing'],
      dtype='object')

answered Dec 5, 2015 at 17:18

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Thodoris P Over a year ago

Just what i needed! I dont understand what you are doing with the "with" statement. Couldn't you just print df?

jezrael Over a year ago

Maybe In 19 better explains.

Collectives™ on Stack Overflow

Python pandas - value_counts not working properly

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related