Is there an efficient way to match two string columns in pandas?

Question

I want to match two string columns and count number of (exact) matches. For example, there are two columns like

index   col1   col2
0       aa     ji
1       bs     aa
2       qe     bs
3       gd     aa

col1 consists of unique ids. I want to count how many times each element of col1 occurs in col2. In other words, I would like to get an output like:

col3
2
1
0
0

in above example.

I have tried above work using pandas str.contains() and for loop, but given a large number of observations, it seems too slow and inefficient. My code looks like below.

num = []
for i in range(len(col1)):
    count = col2.str.contains(col1[i]).sum()
    num_replies.append(count)

Is there a time-efficient way to do this work?

MaxU - stand with Ukraine · Accepted Answer · 2019-07-15 17:00:25Z

2

Use map and value_count:

df['col3'] = df['col1'].map(df['col2'].value_counts()).fillna(0)

Output:

   index col1 col2  col3
0      0   aa   ji   2.0
1      1   bs   aa   1.0
2      2   qe   bs   0.0
3      3   gd   aa   0.0

edited Jul 15, 2019 at 17:00

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

answered Jul 15, 2019 at 16:46

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Matthew Son Over a year ago

This works way faster than Chaudhary's suggestion. Thanks.

Yash Thenuan · Accepted Answer · 2019-07-15 16:48:01Z

2

try this :-

df['counts'] = df.col1.apply(lambda x: list(df.col2.values).count(x))

answered Jul 15, 2019 at 16:48

Yash Thenuan

6417 silver badges20 bronze badges

Collectives™ on Stack Overflow

Is there an efficient way to match two string columns in pandas?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related