2

I have rows which look like this

zipcode   room_type
2011      bed
2012      sofa

Every listing presents one airBNB listing. I want to aggregate the data so that I count all the unique values. Every unique value get's its own column and the data is grouped by zipcode. So the result would looking something like this:

zipcode   bed   sofa    ground
1011      200   36      20
1012      720   45      89

How can I get this result with pandas?

1

3 Answers 3

1

I've accomplished this using indexes and reshaping:

df = DataFrame({'zipcode':[20110,20110,20111,20111,20111], 'room_type': ['bed','sofa', 'bed','bed','sofa']})
df.set_index(['zipcode', 'room_type'], inplace=True)
df

zipcode room_type
  20110       bed
             sofa
  20111       bed
              bed
             sofa

# count the values and generate a new dataframe
df2 = DataFrame(df.index.value_counts(), columns=['count'])
df2.reset_index(inplace=True)
df2

            index   count
0    (20111, bed)       2
1    (20110, bed)       1
2   (20111, sofa)       1
3   (20110, sofa)       1

# split the tuple into new columns
df2[['zipcode', 'room_type']] = df2['index'].apply(Series)
df2.drop('index', axis=1, inplace=True)

# reshape 
df2.pivot(index='zipcode', columns='room_type', values='count') 

room_type   bed sofa
zipcode     
  20110       1    1
  20111       2    1
Sign up to request clarification or add additional context in comments.

Comments

1

Firstly apply groupby with the columns 'zipcode' and 'room_type' to get corresponding counts

In [4]: df = df.groupby(['zipcode','room_type'])['room_type'].agg(['count']).reset_index()

In [5]: df
Out[5]: 
   zipcode room_type  count
0    20110       bed      1
1    20110      sofa      1
2    20111       bed      2
3    20111      sofa      1

Now use 'pivot_table' to obtain the desired result

In [6]: df = df.pivot_table(values='count', columns='room_type', index='zipcode')

In [7]: df
Out[7]: 
room_type  bed  sofa
zipcode             
20110        1     1
20111        2     1

Remove columns' name

In [8]: df.columns.name = None

In [9]: df
Out[9]: 
         bed  sofa
zipcode           
20110      1     1
20111      2     1

Finaly reset index

In [10]: df = df.reset_index()

In [11]: df
Out[11]: 
   zipcode  bed  sofa
0    20110    1     1
1    20111    2     1

Comments

1

crosstab way which i find easy to implement

pd.crosstab(df.zipcode,df.room_type).reset_index()

will do the job

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.