How to split repeated strings into list

Question

I have strings each of which is one or more copies of some string. For example:

L = "hellohellohello"
M = "good"
N = "wherewhere"
O = "antant"

I would like to split such strings into a list so that each element just has the part that was repeated. For example:

splitstring(L) ---> ["hello", "hello", "hello"]
splitstring(M) ---> ["good"]
splitstring(N) ---> ["where", "where"]
splitstring(O) ---> ["ant", "ant"]

As the strings are each about 1000 characters long it would be great if this was reasonably fast as well.

Note that in my case the repetitions all start at the start of the string and have no gaps in between them so it's much simpler than the general problem of finding maximal repetitions in a string.

How can one do this?

take a look at this question i think you're looking for something similar? also, the complexity given of this method is O(n) so, it should be pretty fast as per your requirement. — Mridul Kashyap
– Mridul Kashyap, Commented Jul 27, 2016 at 8:44
@MridulKashyap My question is much simpler as my repetitions start at the beginning of the string and don't have any gaps in between them. — Simd
– Simd, Commented Jul 27, 2016 at 8:45

Aran-Fey · Accepted Answer · 2016-07-27 09:35:22Z

4

Using regex to find the repeating word, then simply creating a list of the appropriate length:

def splitstring(string):
    match= re.match(r'(.*?)(?:\1)*$', string)
    word= match.group(1)
    return [word] * (len(string)//len(word))

edited Jul 27, 2016 at 9:35

answered Jul 27, 2016 at 8:55

Aran-Fey

44k13 gold badges113 silver badges161 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

alex Over a year ago

Nice idea, I thought about doing something like that too.

AdrienW · Accepted Answer · 2016-07-27 09:17:38Z

1

Try this. Instead of cutting your list, it concentrates on finding the shortest pattern, then just creates a new list by repeating this pattern an appropriate number of times.

def splitstring(s):
    # searching the number of characters to split on
    proposed_pattern = s[0]
    for i, c in enumerate(s[1:], 1):
        if proposed_pattern == s[i:(i+len(proposed_pattern))]:
            # found it
            break
        else:
            proposed_pattern += c
    else:
        print 'found no pattern'
        exit(1)
    # generating the list
    n = len(proposed_pattern)
    return [proposed_pattern]*(len(s)//n)


if __name__ == '__main__':
    L = 'hellohellohellohello'
    print splitstring(L)  # prints ['hello', 'hello', 'hello', 'hello']

edited Jul 27, 2016 at 9:17

answered Jul 27, 2016 at 8:53

AdrienW

3,5527 gold badges36 silver badges66 bronze badges

1 Comment

AdrienW Over a year ago

I didn't know any of these 3 things, thank you sir. I will test this and edit

Gábor Erdős · Accepted Answer · 2016-07-27 08:48:42Z

0

The approach i would use:

import re

L = "hellohellohello"
N = "good"
N = "wherewhere"

cnt = 0
result = ''
for i in range(1,len(L)+1):
    if cnt <= len(re.findall(L[0:i],L)):
        cnt = len(re.findall(L[0:i],L))
        result = re.findall(L[0:i],L)[0]

print(result)

Gives the following outputs with the corresponding variable:

hello
good
where

answered Jul 27, 2016 at 8:48

Gábor Erdős

3,6894 gold badges28 silver badges62 bronze badges

Comments

pawelty · Accepted Answer · 2016-07-27 09:14:57Z

0

Assuming that the length of the repeated word is longer than 1 this would work:

a = "hellohellohello"

def splitstring(string):
    for number in range(1, len(string)):
        if string[:number] == string[number:number+number]:
            return string[:number]
    #in case there is no repetition
    return string

splitstring(a)

edited Jul 27, 2016 at 9:14

answered Jul 27, 2016 at 8:58

pawelty

1,00013 silver badges28 bronze badges

1 Comment

cogitovita Over a year ago

It fails on "aabaab".

Fan.Dog · Accepted Answer · 2016-07-27 09:40:35Z

0

#_*_ coding:utf-8 _*_
import re
'''
refer to the code of Gábor Erds below
'''

N = "wherewhere"
cnt = 0
result = ''
countN = 0
showresult = []

for i in range(1,len(N)+1):
    if cnt <= len(re.findall(N[0:i],N)):
        cnt = len(re.findall(N[0:i],N))
        result = re.findall(N[0:i],N)[0]
        countN = len(N)/len(result)
for i in range(0,countN):
    showresult.append(result)
print showresult

answered Jul 27, 2016 at 9:40

Fan.Dog

1

1 Comment

Nander Speerstra Over a year ago

Please add an explanation to why your code is different from Gábor Erdős's post.

Collectives™ on Stack Overflow

How to split repeated strings into list

5 Answers 5

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related