1

I have the following sample data

df1 = [[52, '1', '10'], [54, '1', '4'],
       [55, '2', '3'], [52, '1', '10'],
       [55, '2', '10'], [52, '1', '4']]
    
df = pd.DataFrame(df1, columns =['Cow', 'Lact', 'Procedure'])

df2 = [['52', '1'], ['53', '9'],
       ['54', '2'], ['55', '2']]
    
df2 = pd.DataFrame(df2, columns =['Cow', 'Lact'])

The tables look like:

df:

    Cow Lact    Procedure
0   52  1        10
1   54  1        4
2   55  2        3
3   52  1        10
4   55  2        10
5   52  1        4

df2:

    Cow Lact
0   52  1
1   53  9
2   54  2
3   55  2

I would like to count the number of procedure = 10 for each Cow-Lact combination in df2 and then add a column to df2 called Tproc that includes the count.

The output I am looking for is

    Cow Lact Tproc
0   52  1     2
1   53  9     0
2   54  2     0
3   55  2     1

The following filter does not work:

filt = [(df['Cow']==df2['Cow'])&(df['Lact']==df2['Lact'])&(df['Procedure']==10)]

My plan was then to use .len to get a count

df2['Tproc'] = df2.loc[filt].len

How can you filter a DataFrame based on values in another DataFrame to count the number of rows that satisfy the condition?

1
  • Your data types are not the same between columns of same name Commented Mar 3, 2022 at 6:16

3 Answers 3

1

You can use value_counts and reindex your first dataframe before append the values to Tproc column:

df2['Tproc'] = (
    df1[df1['Procedure'] == '10'].value_counts(['Cow', 'Lact'])
       .reindex(pd.MultiIndex.from_frame(df2[['Cow', 'Lact']]), fill_value=0).values
)
print(df2)

# Output
  Cow Lact  Tproc
0  52    1      2
1  53    9      0
2  54    2      0
3  55    2      1

Setup:

df1 = pd.DataFrame({'Cow': [52, 54, 55, 52, 55, 52],
                    'Lact': [1, 1, 2, 1, 2, 1],
                    'Procedure': ['10', '4', '3', '10', '10', '4']})

df2 = pd.DataFrame({'Cow': [52, 53, 54, 55], 'Lact': [1, 9, 2, 2]})
Sign up to request clarification or add additional context in comments.

2 Comments

Corralien I like your code. Thanks for the heads up re the datatypes. In the real dataset the cow and lact variables are integers. When I run the code you wrote with this example it works great. When I run it with the integer variables and remove the df1.astype({'Cow'}:str) like below df2['Tproc'] = ( [df1['Procedure'] == '10'].value_counts(['Cow', 'Lact']) .reindex(df2[['Cow', 'Lact']], fill_value=0).values ) I get the following, TypeError: list indices must be integers or slices, not str also error if I leave in df['Cow']
@JohnH. I updated my answer according to your comment. Take care, it's not .value_counts(['Cow', 'Lact']) but .value_counts([['Cow', 'Lact']]).
1

You could merge + groupby + sum:

tmp = df2.merge(df.astype(str), on=['Cow','Lact'], how='left')
out = tmp['Procedure'].eq('10').groupby([tmp['Cow'], tmp['Lact']]).sum().reset_index(name='Tproc')

Output:

  Cow Lact  Tproc
0  52    1      2
1  53    9      0
2  54    2      0
3  55    2      1

Comments

1

use groupby() + size() and then merge()

out = df2.merge(
    df[df['Procedure'] == '10'].groupby(['Cow', 'Lact']).size().reset_index(name='Tproc').astype(str), 
    how='left', 
    on=['Cow','Lact']
).fillna(0)

output:

  Cow Lact Tproc
0 52  1   2
1 53  9   0
2 54  2   0
3 55  2   1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.