7

I'm trying to count the number of each category of storm for each unique x and y combination. For example. My dataframe looks like:

x   y  year  Category
1   1  1988     3
2   1  1977     1
2   1  1999     2
3   2  1990     4

I want to create a dataframe that looks like:

x   y   Category 1   Category 2   Category 3  Category 4
1   1        0           0            1           0
2   1        1           1            0           0
3   2        0           0            0           1

I have tried various combinations of .groupby() and .count(), but I am still not getting the desired result. The closet thing I could get is:

df[['x','y','Category']].groupby(['Category']).count()

However, the result counts for all x and y, not the unique pairs:

Cat       x           y     
1       3773         3773
2       1230         1230
3       604          604
4       266          266
5       50           50
NA      27620        27620
TS      16884        16884

Does anyone know how to do a count operation on one column based on the uniqueness of two other columns in a dataframe?

4 Answers 4

2

pivot_table sounds like what you want. A bit of a hack is to add a column of 1's to use to count. This allows pivot_table to add 1 for each occurrence of a particular x-y and Category combination. You will set this new column as your value parameter in pivot_table and the aggfunc paraemter to np.sum. You'll probably want to set fill_value to 0 as well:

df['count'] = 1
result = df.pivot_table(
    index=['x', 'y'], columns='Category', values='count',
    fill_value=0, aggfunc=np.sum
)

result:

Category  1  2  3  4
x y                 
1 1       0  0  1  0
2 1       1  1  0  0
3 2       0  0  0  1

If you're interested in keeping x and y as columns and having the other column names as Category X, you can rename the columns and use reset_index:

result.columns = [f'Category {x}' for x in result.columns]
result = a.reset_index()
Sign up to request clarification or add additional context in comments.

3 Comments

That count column and aggfunc is pretty smart.
Thanks, I was kind of hoping someone would point out a cleaner way to do it though!
It's clean. But just explain why you used count = 1. Because other readers might get confused.
1

You can use groupby first:

df_new = df.groupby(['x', 'y', 'Category']).count()
df_new
                  year  count
x   y   Category        
1   1      3       1    1
2   1      1       1    1
           2       1    1
3   2      4       1    1

Then pivot_table

df_new = df_new.pivot_table(index=['x', 'y'], columns='Category', values='count', fill_value=0)
df_new
Category    1   2   3   4
x   y               
1   1       0   0   1   0
2   1       1   1   0   0
3   2       0   0   0   1

Comments

1

You can use pd.get_dummies after setting index using set_index, then use sum with level parameter to collapse rows:

pd.get_dummies(df.set_index(['x','y'])['Category'].astype(str),
               prefix='Category ', 
               prefix_sep='')\
  .sum(level=[0,1])\
  .reset_index()

Output:

   x  y  Category 1  Category 2  Category 3  Category 4
0  1  1           0           0           1           0
1  2  1           1           1           0           0
2  3  2           0           0           0           1

Comments

0

Or use groupby twice, with a lot of additional, i.e get_dummies with apply etc...

Like:

>>> df.join(df.groupby(['x','y'])['Category']
           .apply(lambda x: x.astype(str).str.get_dummies().add_prefix('Category ')))
           .groupby(['x','y']).sum().fillna(0).drop(['year','Category'],1).reset_index()
   x  y  Category 1  Category 2  Category 3  Category 4
0  1  1         0.0         0.0         1.0         0.0
1  2  1         1.0         1.0         0.0         0.0
2  3  2         0.0         0.0         0.0         1.0
>>> 

1 Comment

@MohitMotwani Done it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.