8

I have strings that look like this example: "AAABBBCDEEEEBBBAA"

Any character is possible in the string.

I want to split it to a list like: ['AAA','BBB','C','D','EEEE','BBB','AA']

so every continuous stretch of the same characters goes to separate element of the split list.

I know that I can iterate over characters in the string, check every i and i-1 pair if they contain the same character, etc. but is there a more simple solution out there?

5 Answers 5

15

We could use Regex:

>>> import re
>>> r = re.compile(r'(.)\1*')
>>> [m.group() for m in r.finditer('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

Alternatively, we could use itertools.groupby.

>>> import itertools
>>> [''.join(g) for k, g in itertools.groupby('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

timeit shows Regex is faster (for this particular string) (Python 2.6, Python 3.1). But Regex is after all specialized for string, and groupby is a generic function, so this is not so unexpected.

Sign up to request clarification or add additional context in comments.

1 Comment

Wow, thanks, regex solution is cool, groupby too, how is it possible that I spend so much time on this problem before sending the question to stackoverflow and getting the answer in 5 minutes ;-)
9
>>> from itertools import groupby
>>> [''.join(g) for k, g in groupby('AAAABBBCCD')]
['AAAA', 'BBB', 'CC', 'D']

And by normal string manipulation

>>> a=[];S="";p=""
>>> s
'AAABBBCDEEEEBBBAA'
>>> for c in s:
...     if c != p: a.append(S);S=""
...     S=S+c
...     p=c
...
>>> a.append(S)
>>> a
['', 'AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
>>> filter(None,a)
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']

Comments

3
import itertools
s = "AAABBBCDEEEEBBBAA"
["".join(chars) for _, chars in itertools.groupby(s)]

Comments

0

Just another way of soloving your problem :

#!/usr/bin/python

string = 'AAABBBCDEEEEBBBAA'
memory = str()
List = list()
for index, element in enumerate(string):
    if index > 0:
        if string[index] == string[index - 1]:
            memory += string[index]
        else:
            List.append(memory)
            memory = element
    else:
        memory += element

print List

Comments

0

Another solution is:

def solveProblem():
    def convertString(string):
        number=1
        result=list()
        for currentIndex in range(0, len(string), 1):
            for nextIndex in range((currentIndex+1), len(string), 1):
                if(string[currentIndex]==string[nextIndex]):
                    #print(currentIndex, ", ", nextIndex, sep="")
                    number += 1
                    if(nextIndex==(len(string)-1)):
                        strings=list()
                        for index in range(0, number, 1):
                            strings.append(string[currentIndex])
                        separator=""
                        interimString=separator.join(strings)
                        result.append(interimString)
                    break
                if(string[currentIndex]!=string[nextIndex]):
                    if(number>1):
                        strings=list()
                        for index in range(0, number, 1):
                            strings.append(string[currentIndex])
                        separator=""
                        interimString=separator.join(strings)
                        result.append(interimString)
                        number=1
                        break
                    if(number==1):
                        result.append(string[currentIndex])
                        break
        return result
    #"AAABBBCDEEEEBBBAA"
    string="AAABBBCDEEEEBBBAA"
    print("string <- ", string, sep="")
    print("result <- ", convertString(string), sep="")
def main():
    solveProblem()
main()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.