Count instances of string in separate dataframe, not exact match, using Python, pandas

Question

I have a list of names in df1 and I need to see if they match anywhere in df2. I know I probably need to use str.contains on each item and add one to a count, but I haven't figured out how to do this successfully.

for e in df2['People_separate']:
count = df1['People'].str.contains(e)
if count == True:
    count += 1
return count

example: df1:

| People    | 
| --------  | 
| A B / E F | 
| A B / C D | 
| E F       |

df2 (looking to populate the 'counts' column:

| People_separate | Counts |
| --------------- | -------------|
| A B             | 2            |
| C D             | 1            |
| E F             | 2            |

perl · Accepted Answer · 2021-03-09 22:00:18Z

1

You can split the rows by ' / ' with split, then explode to convert lists into rows, and then count values with value_counts:

df['People'].str.split(' / ').explode().value_counts()

Output:

A B    2
E F    2
C D    1

answered Mar 9, 2021 at 22:00

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ALollz · Accepted Answer · 2021-03-09 22:32:16Z

1

If the "not exact match" is really a requirement, then we form a search pattern to use with Series.str.extractall, and take the value_counts of that extraction. This way if your search word is 'foo' a word like 'foobar' will still count as a match (because it contains 'foo').

The reindex ensures the resulting Series also shows 0s for words that never matched.

import pandas as pd

df1 = pd.DataFrame({"People": ['A B / E F', 'A B / C D', 'E F']})
df2 = pd.DataFrame({"People_separate": ['A B', 'C D', 'E F', 'banana']})

pat = '(' + '|'.join(df2['People_separate']) + ')'
#(A B|C D|E F)

(df1['People'].str.extractall(pat)[0]
   .value_counts()
   .reindex(df2['People_separate'], fill_value=0))

People_separate
A B       2
C D       1
E F       2
banana    0
Name: 0, dtype: int64

edited Mar 9, 2021 at 22:32

answered Mar 9, 2021 at 22:18

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Collectives™ on Stack Overflow

Count instances of string in separate dataframe, not exact match, using Python, pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related