1

I have a list of names in df1 and I need to see if they match anywhere in df2. I know I probably need to use str.contains on each item and add one to a count, but I haven't figured out how to do this successfully.

for e in df2['People_separate']:
count = df1['People'].str.contains(e)
if count == True:
    count += 1
return count

example: df1:

| People    | 
| --------  | 
| A B / E F | 
| A B / C D | 
| E F       |

df2 (looking to populate the 'counts' column:

| People_separate | Counts |
| --------------- | -------------|
| A B             | 2            |
| C D             | 1            |
| E F             | 2            |
0

2 Answers 2

1

You can split the rows by ' / ' with split, then explode to convert lists into rows, and then count values with value_counts:

df['People'].str.split(' / ').explode().value_counts()

Output:

A B    2
E F    2
C D    1
Sign up to request clarification or add additional context in comments.

Comments

1

If the "not exact match" is really a requirement, then we form a search pattern to use with Series.str.extractall, and take the value_counts of that extraction. This way if your search word is 'foo' a word like 'foobar' will still count as a match (because it contains 'foo').

The reindex ensures the resulting Series also shows 0s for words that never matched.

import pandas as pd

df1 = pd.DataFrame({"People": ['A B / E F', 'A B / C D', 'E F']})
df2 = pd.DataFrame({"People_separate": ['A B', 'C D', 'E F', 'banana']})

pat = '(' + '|'.join(df2['People_separate']) + ')'
#(A B|C D|E F)

(df1['People'].str.extractall(pat)[0]
   .value_counts()
   .reindex(df2['People_separate'], fill_value=0))

People_separate
A B       2
C D       1
E F       2
banana    0
Name: 0, dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.