Counting longest occurrence of repeated sequence in Python

Question

What's the easiest way to count the longest consecutive repeat of a certain character in a string? For example, the longest consecutive repeat of "b" in the following string:

my_str = "abcdefgfaabbbffbbbbbbfgbb"

would be 6, since other consecutive repeats are shorter (3 and 2, respectively.) How can I do this in Python?

The answer by @interjay is the best one because it is easily altered to work for any character repeat: max(len(list(y)) for (c,y) in itertools.groupby(my_str) — rjurney
– rjurney, Commented Jun 30, 2022 at 11:30

Angad Gill · Accepted Answer · 2016-07-22 09:46:32Z

14

How about a regex example:

import re
my_str = "abcdefgfaabbbffbbbbbbfgbb"
len(max(re.compile("(b+b)*").findall(my_str))) #changed the regex from (b+b) to (b+b)*
# max([len(i) for i in re.compile("(b+b)").findall(my_str)]) also works

Edit, Mine vs. interjays

x=timeit.Timer(stmt='import itertools;my_str = "abcdefgfaabbbffbbbbbbfgbb";max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=="b")')
x.timeit()
22.759046077728271

x=timeit.Timer(stmt='import re;my_str = "abcdefgfaabbbffbbbbbbfgbb";len(max(re.compile("(b+b)").findall(my_str)))')
x.timeit()
8.4770550727844238

edited Jul 22, 2016 at 9:46

Angad Gill

34 bronze badges

answered Apr 18, 2010 at 22:15

Mark

109k20 gold badges180 silver badges238 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

doug Over a year ago

+1 for helping to restore partially rehabilitate the value of regexp on this Site--very brave.

interjay · Accepted Answer · 2010-04-18 22:10:51Z

13

Here is a one-liner:

max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=='b')

Explanation:

itertools.groupby will return groups of consecutive identical characters, along with an iterator for all items in that group. For each such iterator, len(list(y)) will give the number of items in the group. Taking the maximum of that (for the given character) will give the required result.

answered Apr 18, 2010 at 22:10

interjay

111k22 gold badges278 silver badges264 bronze badges

1 Comment

rjurney Over a year ago

And this works for ANY repeated character if you just remove if c=='b': max(len(list(y)) for (c,y) in itertools.groupby(my_str)

Ignacio Vazquez-Abrams · Accepted Answer · 2010-04-18 22:14:15Z

6

Here's my really boring, inefficient, straightforward counting method (interjay's is much better). Note, I wrote this in this little text field, which doesn't have an interpreter, so I haven't tested it, and I may have made a really dumb mistake that a proof-read didn't catch.

my_str = "abcdefgfaabbbffbbbbbbfgbb"
last_char = ""
current_seq_len = 0
max_seq_len = 0

for c in mystr:
    if c == last_char:
        current_seq_len += 1
        if current_seq_len > max_seq_len:
            max_seq_len = current_seq_len
    else:
        current_seq_len = 1
        last_char = c

print(max_seq_len)

edited Apr 18, 2010 at 22:14

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

answered Apr 18, 2010 at 22:10

Josh Wright

2,51516 silver badges20 bronze badges

4 Comments

Edurne Pascual Over a year ago

You may need to update last_char somewhere in the loop; other than that, +1 for providing the really easiest way: it's the approach that less concepts/skills requires from the programmer. BTW, it's not "innefficient": any solution will need to look at all characters on the string to provide the correct result, so it's cost will be at least O(n): your approach has a time cost of O(n), so it's decently efficient. A slight efficiency improvement would be to update max_seq_len on the else: block, so it's updated once per sequence rather than once per character.

Edurne Pascual Over a year ago

Ok, ignore my point about updating last_char, Ignacio just fixed it ;)

Josh Wright Over a year ago

Thanks Ignacio ;) (I only meant inefficient in the terms of how much typing you have to do)

James Antill Over a year ago

I would probably use max() instead of the if, and it doesn't count the longest sequence of "b"'s but of any character (maybe that broke in the ignacio fix though?). Anyway this is much better than the oneliners, IMNSHO.

doug · Accepted Answer · 2010-04-18 22:15:31Z

2

Using run-length encoding:

import numpy as NP

signal = NP.array([4,5,6,7,3,4,3,5,5,5,5,3,4,2,8,9,0,1,2,8,8,8,0,9,1,3])

px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) if (n - m) > 1 ]

# get longest:
rx2 = [ (b-a, c) for (a, b, c) in rx ]
rx2.sort(reverse=True)

# returns: [(4, 5), (3, 8)], ie, '5' occurs 4 times consecutively, '8' occurs 3 times consecutively

answered Apr 18, 2010 at 22:15

doug

70.2k26 gold badges171 silver badges201 bronze badges

2 Comments

carlo_hamalainen Over a year ago

Shouldn't "if (n - m) > 1" be "if (n - m) >= 1" to detect a run of length 1?

doug Over a year ago

@carlo_hamalainen -- no. not really interested in detecting "run lengths" of 1.

Martin Tournoij · Accepted Answer · 2014-11-05 11:04:32Z

1

Here is my code, Not that efficient but seems to work:

def LongCons(mystring):
    dictionary = {}
    CurrentCount = 0
    latestchar = ''

    for i in mystring:
        if i == latestchar:
            CurrentCount += 1
            if dictionary.has_key(i):
                if CurrentCount > dictionary[i]:
                    dictionary[i]=CurrentCount
        else:
            CurrentCount = 1
            dictionary.update({i: CurrentCount})
            latestchar = i
    k = max(dictionary, key=dictionary.get)
    print(k, dictionary[k])
    return

edited Nov 5, 2014 at 11:04

Martin Tournoij

28k24 gold badges110 silver badges158 bronze badges

answered Apr 29, 2014 at 23:20

gogada

111 bronze badge

Collectives™ on Stack Overflow

Counting longest occurrence of repeated sequence in Python

5 Answers 5

1 Comment

1 Comment

4 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

4 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related