Python/Pandas, count a specific values based on multiple values

Question

Using Python/Pandas and given a name-value (Bob), how do I value_count by job and state, and list by month. Initial df:

The output for Bob should be something like this::

I am working with many thousands of records and I need some way to retrieve the number of Jobs codes and states for a specific name for each month.

I can get this information individually using .loc and value_counts(), but not all at once. Grouby stats is not specific. Any ideas???

Possible duplicate of Get statistics for each group (such as count, mean, etc) using pandas GroupBy? — ndclt
– ndclt, Commented Sep 28, 2019 at 20:01
Did not know about images. I won't do it again. As far as getting statistics for each group, the link did not discuss getting statistics for a specific value (name). If I do group statistics and groupby I get results for 400+ names. I am only interested in one name. — Marty
– Marty, Commented Sep 28, 2019 at 22:50
I added my eventual answer below. Thanks for pointing me in the right direction. — Marty
– Marty, Commented Sep 29, 2019 at 17:13

Valdi_Bo · Accepted Answer · 2019-09-28 21:28:45Z

1

I added one row (for June) to your test DataFrame:

   Name State Job Code Month
0   Bob    CA        A   APR
1   Joe    CA        B   APR
2  Mary    AZ        C   MAY
3   Bob    CA        D   MAY
4   Bob    NV        C   MAY
5   Bob    CA        D   MAY
6   Bob    CA        D   JUN

The reason is to show that the way I provided "calendar" sort on Month (instead of alphabetical) works OK.

To provide the mentioned "calendar" sort, start with:

df.Month = pd.Categorical(df.Month, ['JAN', 'FEB', 'MAR', 'APR',
    'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'])

Then, almost all processing can be performed in the following instruction:

res = df.query('Name == "Bob"').drop(columns=['Name'])\
    .set_index('Month').stack().reset_index(name='val')\
    .groupby(['Month', 'level_1', 'val']).size()\
    .sort_index(ascending=[True, False, True]).droplevel(1)

This instruction is quite long (involves multiple steps), so to understand it run each step separately and see the intermediate results.

I wrote "almost all" before, because for now the MultiIndex has unnecessary column names. To clear them, run:

res.index.names = ['', '']

And the final result is:

APR  CA    1
     A     1
MAY  CA    2
     NV    1
     C     1
     D     2
JUN  CA    1
     D     1
dtype: int64

Note that data for JUN is after MAY, just as it should be.

edited Sep 28, 2019 at 21:28

answered Sep 28, 2019 at 21:22

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marty Over a year ago

I tried this code but I couldn't get past an error related to the expected index. Thanks anyway. I will keep the code in my back pocket to maybe use for something else.

Marty · Accepted Answer · 2019-09-29 17:09:10Z

0

I was hoping to get one table but I was able to get close enough using groupby and two .count() methods. I figured out I needed to filter the dataframe first to only the value I wanted (Bob). This may seem obvious but I am new. Thanks for everyone's help.

df1 = df[df['Name'] == 'Bob']
grp1 = df1.groupby(['Month', 'Job Code']) 
grp1.count()

grp2 = df1.groupby(['Month', 'State']) 
grp2.count()

answered Sep 29, 2019 at 17:09

Marty

531 silver badge9 bronze badges

Collectives™ on Stack Overflow

Python/Pandas, count a specific values based on multiple values

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related