1

I have a little question about how to check and compare two or more characters in the list in Python.

For example, I have a string "cdcdccddd". I made a list from this string to easier comparing the characters. And the needed output is: c: 1 d: 1 c: 1 d: 1 c: 2 d: 3 So it is counting the characters, if first is not the same as the second, the counter = 1, if the second is the same as third, then counter is +1 and need check the third with fourth and so on.

I got so far this algorithm:
text = "cdcdccddd"
l = []
l = list(text)
print list(text)

for n in range(0,len(l)):
    le = len(l[n])
    if l[n] == l[n+1]:
        le += 1
        if l[n+1] == l[n+2]:
            le += 1
        print l[n], ':' , le
    else: 
        print l[n], ':', le

but its not working good, because its counts the first and second element, but not the second and third. For this output will be:

c : 1
d : 1
c : 1
d : 1
c : 2
c : 1
d : 3

How to make this algorithm better?

Thank you!

1
  • As you said this algorithm is not correct in base because you cannot count all the occurrences like this (since you are not aware of the number of duplicate sequences). One way for overcoming to this problem is categorizing your characters then counting the number of characters in each sub set. Commented Apr 10, 2016 at 21:31

3 Answers 3

3

You can use itertools.groupby:

from itertools import groupby
s = "cdcdccddd"

print([(k, sum(1 for _ in v)) for k,v in groupby(s)])
[('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]

Consecutive chars will be grouped together, so each k is the char of that group, calling sum(1 for _ in v) gives us the length of each group so we end up with (char, len(group)) pairs.

If we run it in ipython and call list on each v it should be really clear what is happening:

In [3]: from itertools import groupby

In [4]: s = "cdcdccddd"

In [5]: [(k, list(v)) for k,v in groupby(s)]
Out[5]: 
[('c', ['c']),
 ('d', ['d']),
 ('c', ['c']),
 ('d', ['d']),
 ('c', ['c', 'c']),
 ('d', ['d', 'd', 'd'])]

We can also roll our own pretty easily:

def my_groupby(s):
    # create an iterator
    it = iter(s)
    # set consec_count, to one and pull first char from s
    consec_count, prev = 1,  next(it)
    # iterate over the rest of the string
    for ele in it:
        # if last and current char are different
        # yield previous char, consec_count and reset
        if prev != ele:
            yield prev, 
            consec_count, = 0
        prev = ele
        consec_count, += 1
    yield ele, consec_count

Which gives us the same:

In [8]: list(my_groupby(s))
Out[8]: [('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]
Sign up to request clarification or add additional context in comments.

Comments

1

That looks like a regular expression of repeating characters, so you can use a regex with repeated characters and then find the length of each match:

import re
text = "cdcdccddd"
matches = re.findall(r'(.)(\1*)', text)
result = ['{}: {}'.format(match[0], len(''.join(match))) for match in matches]

Result:

>>> print(*result, sep='\n')
c: 1
d: 1
c: 1
d: 1
c: 2
d: 3

Comments

1

First thing, strings are already lists in python, so you can just say for character in text: to get each of the characters out.

I would try something like this:

currentchar = text[0]
currentcount = 0

for c in text[1:]:
    if c == currentchar:
        currentcount += 1
    else:
        print(currentchar + ": " + str(currentcount+1))
        currentchar = c
        currentcount = 0

print(currentchar + ": " + str(currentcount+1))

2 Comments

Strings are sequences, but not lists. A list is its own thing separate from strings.
Yes you are correct, I should've said they can be used as lists.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.