2

I have a dataframe like this:

Id  Seq Event
1     2    A 
1     3    B 
1     5    c 
1     6    A 
2     1    A 
2     2    B 
2     4    A 
2     6    B

I want to find how many times a specific pattern appears. Let's say "AB" . The output should be.

Id  Pattern_Count
1    1
2    2 

I tried using Event + Event.shift() and searching for the specific pattern. It's a tedious task when I have to search for a longer pattern like "ABCDE" and I don't want to shift it 4 times. Is there any alternative way to do this?

0

2 Answers 2

4

You can do this with groupby, agg, and str.count:

(df.groupby('Id')['Event']
   .agg(''.join)
   .str.count('AB')
   .reset_index(name='Pattern_Count'))

   Id  Pattern_Count
0   1              1
1   2              2

Note that str.count will work for simple substring matches only, regex patterns are not supported directly.

Sign up to request clarification or add additional context in comments.

2 Comments

I've been studying your answer. It amazes me that so many things can be strung together in this dense fashion. I would not have thought of using .join as a function to .agg. It does not show up in the documentation I have read. How did you know you could use that in .agg? I tried dir(groupby.agg) to list all functions, but it did not work.
@R.Wayne You can pass any function to agg, as long as it results in a single, aggregated value. In this case, I used str.join. See this post on string concatenation. agg(''.join) can also be specified as agg(lambda x: ''.join(x)) but is more verbose and less performant.
2

You can use groupby to isolate your groups then concatenate your strings with sum to count the occurrences of your substring.

result = df.groupby('Id')['Event'].sum().str.count('AB')

2 Comments

concatenating strings with sum is not very efficient since strings are immutable.
Good point. Seemed more intuitive to use sum, but join should be the faster approach.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.