1

I'm trying to split a string of letters and numbers into a list of tuples like this:

[(37, 'M'), (1, 'I'), (5, 'M'), (1, 'D'), (25, 'M'), (33, 'S')]

This is what is kind of working, but when I try to get print "37" (print(cigar[d:pos])) it does not print the entire string, only 3.

#iterate through cigar sequence
print(cigar)
#count position in cigar sequence
pos=0
#count position of last key
d=0

splitCigar=[]

for char in cigar:
    
    #print(cigar[pos])
    if char.isalpha() == False:
        print("first for-loop")
        print(cigar[d])
        print(cigar[pos])
        print(cigar[d:pos])
        num=(cigar[d:pos])
        pos+=1

    if char.isalpha() == True:
        print("second for-loop")
        splitCigar.append((num,char))
        pos+=1
        d=pos   
    
print(splitCigar)

The output of this code:

37M1I5M1D25M33S
first for-loop
3
3

first for-loop
3
7
3
second for-loop

<and so on...>

second for-loop
[('3', 'M'), ('', 'I'), ('', 'M'), ('', 'D'), ('2', 'M'), ('3', 'S')]
1
  • 3
    can you clarify your input and expected output Commented Nov 4, 2020 at 13:10

4 Answers 4

1

Solution using regexp:

import re
cigar = "37M1I5M1D25M33S"

digits = re.findall('[0-9]+', cigar)
chars = re.findall('[A-Z]+', cigar)

results = list(zip(digits, chars))

Everything printed so you can see what it does:

>>> print(digits)
['37', '1', '5', '1', '25', '33']
>>> print(chars)
['M', 'I', 'M', 'D', 'M', 'S']
>>> print(results)
[('37', 'M'), ('1', 'I'), ('5', 'M'), ('1', 'D'), ('25', 'M'), ('33', 'S')]

I hope this "functional" approach suits you

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, thank you! Is there any way to convert the digits to integers in the final results?
Sure things! You can do digits = [int(digit) for digit in digits] to convert them before zipping
1

Pyparsing library makes writing parsers more maintainable and readable. If the format of the data changes, you can modify the parser without too much effort.

import pyparsing as pp


def make_grammar():
    # Number consists of several digits
    num = pp.Word(pp.nums).setName("Num")
    # Convert the num to int
    num = num.setParseAction(
        pp.pyparsing_common.convertToInteger)
    # 1 letter
    letter = pp.Word(pp.alphas, exact=1)\
        .setName("Letter")
    # 1 num followed by letter with possibly
    # some spaces in between
    package = pp.Group(num + letter)
    # 1 or more packages
    grammar = pp.OneOrMore(package)
    return grammar


def main():
    x = "37M1I5M1D25M33S"
    g = make_grammar()
    result = g.parseString(x, parseAll=True)
    print(result)
    # [[37, 'M'], [1, 'I'], [5, 'M'], 
    #  [1, 'D'], [25, 'M'], [33, 'S']]
    # If you really want tuples:
    print([tuple(r) for r in result])


main()

1 Comment

Nice example. Note that pyparsing now has the Char class that you can use in place of Word(exact=1)
0

Sounds like a job for itertools.groupby

inp = '37M1I5M1D25M33S'
e = [''.join(g) for k, g in itertools.groupby(inp, key=lambda l: l.isdigit())]
print(e)

This will give you-

['37', 'M', '1', 'I', '5', 'M', '1', 'D', '25', 'M', '33', 'S']

Basically, groupby collects all consecutive elements that satisfy the key function (.isdigit) into groups, each of those groups is turned into a string using ''.join

Now, all you have to do is zip them together-

res = list(zip(e[::2], e[1::2]))
print(res)

That will give you

[('37', 'M'), ('1', 'I'), ('5', 'M'), ('1', 'D'), ('25', 'M'), ('33', 'S')]

If you want numericals instead of string representation of numbers, that's also super simple-

res = list(map(lambda l: (int(l[0]), l[1]), res))

Which yields

[(37, 'M'), (1, 'I'), (5, 'M'), (1, 'D'), (25, 'M'), (33, 'S')]

I'd say this is a pretty pythonic solution for your problem.

Comments

0

You can simply attain the desired output as follows:

cigar= '37M1I5M1D25M33S'

splitCigar=[]
t=[]
num=''
for char in cigar:
    if char.isalpha()==False:
        num+= char
    else:
        t.append(num)
        num=''
        t.append(char)
        
        splitCigar.append(tuple(t))
        t=[]
print(splitCigar)

Output: [('37', 'M'), ('1', 'I'), ('5', 'M'), ('1', 'D'), ('25', 'M'), ('33', 'S')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.