0

Using Python/Pandas and given a name-value (Bob), how do I value_count by job and state, and list by month. Initial df:

enter image description here

The output for Bob should be something like this::

enter image description here

I am working with many thousands of records and I need some way to retrieve the number of Jobs codes and states for a specific name for each month.

I can get this information individually using .loc and value_counts(), but not all at once. Grouby stats is not specific. Any ideas???

4
  • Possible duplicate of Get statistics for each group (such as count, mean, etc) using pandas GroupBy? Commented Sep 28, 2019 at 20:01
  • Dont post pictures of data/code Commented Sep 28, 2019 at 20:30
  • Did not know about images. I won't do it again. As far as getting statistics for each group, the link did not discuss getting statistics for a specific value (name). If I do group statistics and groupby I get results for 400+ names. I am only interested in one name. Commented Sep 28, 2019 at 22:50
  • I added my eventual answer below. Thanks for pointing me in the right direction. Commented Sep 29, 2019 at 17:13

2 Answers 2

1

I added one row (for June) to your test DataFrame:

   Name State Job Code Month
0   Bob    CA        A   APR
1   Joe    CA        B   APR
2  Mary    AZ        C   MAY
3   Bob    CA        D   MAY
4   Bob    NV        C   MAY
5   Bob    CA        D   MAY
6   Bob    CA        D   JUN

The reason is to show that the way I provided "calendar" sort on Month (instead of alphabetical) works OK.

To provide the mentioned "calendar" sort, start with:

df.Month = pd.Categorical(df.Month, ['JAN', 'FEB', 'MAR', 'APR',
    'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'])

Then, almost all processing can be performed in the following instruction:

res = df.query('Name == "Bob"').drop(columns=['Name'])\
    .set_index('Month').stack().reset_index(name='val')\
    .groupby(['Month', 'level_1', 'val']).size()\
    .sort_index(ascending=[True, False, True]).droplevel(1)

This instruction is quite long (involves multiple steps), so to understand it run each step separately and see the intermediate results.

I wrote "almost all" before, because for now the MultiIndex has unnecessary column names. To clear them, run:

res.index.names = ['', '']

And the final result is:

APR  CA    1
     A     1
MAY  CA    2
     NV    1
     C     1
     D     2
JUN  CA    1
     D     1
dtype: int64

Note that data for JUN is after MAY, just as it should be.

Sign up to request clarification or add additional context in comments.

1 Comment

I tried this code but I couldn't get past an error related to the expected index. Thanks anyway. I will keep the code in my back pocket to maybe use for something else.
0

I was hoping to get one table but I was able to get close enough using groupby and two .count() methods. I figured out I needed to filter the dataframe first to only the value I wanted (Bob). This may seem obvious but I am new. Thanks for everyone's help.

df1 = df[df['Name'] == 'Bob']
grp1 = df1.groupby(['Month', 'Job Code']) 
grp1.count()

grp2 = df1.groupby(['Month', 'State']) 
grp2.count()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.