13

What's the easiest way to count the longest consecutive repeat of a certain character in a string? For example, the longest consecutive repeat of "b" in the following string:

my_str = "abcdefgfaabbbffbbbbbbfgbb"

would be 6, since other consecutive repeats are shorter (3 and 2, respectively.) How can I do this in Python?

1
  • The answer by @interjay is the best one because it is easily altered to work for any character repeat: max(len(list(y)) for (c,y) in itertools.groupby(my_str) Commented Jun 30, 2022 at 11:30

5 Answers 5

14

How about a regex example:

import re
my_str = "abcdefgfaabbbffbbbbbbfgbb"
len(max(re.compile("(b+b)*").findall(my_str))) #changed the regex from (b+b) to (b+b)*
# max([len(i) for i in re.compile("(b+b)").findall(my_str)]) also works

Edit, Mine vs. interjays

x=timeit.Timer(stmt='import itertools;my_str = "abcdefgfaabbbffbbbbbbfgbb";max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=="b")')
x.timeit()
22.759046077728271

x=timeit.Timer(stmt='import re;my_str = "abcdefgfaabbbffbbbbbbfgbb";len(max(re.compile("(b+b)").findall(my_str)))')
x.timeit()
8.4770550727844238
Sign up to request clarification or add additional context in comments.

1 Comment

+1 for helping to restore partially rehabilitate the value of regexp on this Site--very brave.
13

Here is a one-liner:

max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=='b')

Explanation:

itertools.groupby will return groups of consecutive identical characters, along with an iterator for all items in that group. For each such iterator, len(list(y)) will give the number of items in the group. Taking the maximum of that (for the given character) will give the required result.

1 Comment

And this works for ANY repeated character if you just remove if c=='b': max(len(list(y)) for (c,y) in itertools.groupby(my_str)
6

Here's my really boring, inefficient, straightforward counting method (interjay's is much better). Note, I wrote this in this little text field, which doesn't have an interpreter, so I haven't tested it, and I may have made a really dumb mistake that a proof-read didn't catch.

my_str = "abcdefgfaabbbffbbbbbbfgbb"
last_char = ""
current_seq_len = 0
max_seq_len = 0

for c in mystr:
    if c == last_char:
        current_seq_len += 1
        if current_seq_len > max_seq_len:
            max_seq_len = current_seq_len
    else:
        current_seq_len = 1
        last_char = c

print(max_seq_len)

4 Comments

You may need to update last_char somewhere in the loop; other than that, +1 for providing the really easiest way: it's the approach that less concepts/skills requires from the programmer. BTW, it's not "innefficient": any solution will need to look at all characters on the string to provide the correct result, so it's cost will be at least O(n): your approach has a time cost of O(n), so it's decently efficient. A slight efficiency improvement would be to update max_seq_len on the else: block, so it's updated once per sequence rather than once per character.
Ok, ignore my point about updating last_char, Ignacio just fixed it ;)
Thanks Ignacio ;) (I only meant inefficient in the terms of how much typing you have to do)
I would probably use max() instead of the if, and it doesn't count the longest sequence of "b"'s but of any character (maybe that broke in the ignacio fix though?). Anyway this is much better than the oneliners, IMNSHO.
2

Using run-length encoding:

import numpy as NP

signal = NP.array([4,5,6,7,3,4,3,5,5,5,5,3,4,2,8,9,0,1,2,8,8,8,0,9,1,3])

px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) if (n - m) > 1 ]

# get longest:
rx2 = [ (b-a, c) for (a, b, c) in rx ]
rx2.sort(reverse=True)

# returns: [(4, 5), (3, 8)], ie, '5' occurs 4 times consecutively, '8' occurs 3 times consecutively 

2 Comments

Shouldn't "if (n - m) > 1" be "if (n - m) >= 1" to detect a run of length 1?
@carlo_hamalainen -- no. not really interested in detecting "run lengths" of 1.
1

Here is my code, Not that efficient but seems to work:

def LongCons(mystring):
    dictionary = {}
    CurrentCount = 0
    latestchar = ''

    for i in mystring:
        if i == latestchar:
            CurrentCount += 1
            if dictionary.has_key(i):
                if CurrentCount > dictionary[i]:
                    dictionary[i]=CurrentCount
        else:
            CurrentCount = 1
            dictionary.update({i: CurrentCount})
            latestchar = i
    k = max(dictionary, key=dictionary.get)
    print(k, dictionary[k])
    return

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.