1

I have strings in the format "1-3 6:10-11 7-9" and from them I want to create number sets as follows {1,2,3,6,10,11,7,8,9}.

For creating the set from the range of numbers, I have the following code:

def create_set(src):
    lset = []
    if len(src) > 0:
        pos = src.find('-')
        if pos != -1:
            first = int(src[:pos])
            last  = int(src[pos+1:])
        else:
            return [int(src)]  # Only one number
        for j in range (first, last+1): 
            lset.append(j)
        return set(lset)

But I cannot figure out how to correctly treat the ':' when it appears in the string. Can someone help me?

Thanks in advance!

EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?

5
  • I would be tempted to parse it with a regular expression - I am no expert but that would be the way I would do it - since the 'syntax' seems to be regular. Commented Aug 20, 2016 at 22:02
  • @xnx my thoughts exactly Commented Aug 20, 2016 at 22:03
  • 1
    Why does the 6 have a colon? Commented Aug 20, 2016 at 22:04
  • The colon denotes that the number before it should be added as a single element in the set. And I tried the solution suggested by xnx, but it does not work because the code as is does not recognize a string like '1-2-3' as a valid range. Commented Aug 20, 2016 at 22:08
  • I meant more like "why can't you just have ranges or single numbers?", then split on a space, and handle the ranges appropriately, else just add the single number. Commented Aug 20, 2016 at 22:13

2 Answers 2

5

Something like this might work for you:

s = '1-3 6:10-11 7-9'
s = s.replace(':', ' ')
lset = set()
fs = s.split()
for f in fs:
    r = f.split('-')
    if len(r)==1:
        # add a single number
        lset.add(int(r[0]))
    else:
        # add a range of numbers (inclusive of the endpoints)
        lset |= set(range(int(r[0]), int(r[1])+1))
print(lset)
Sign up to request clarification or add additional context in comments.

1 Comment

This answer is fine but se below for an alternative, perhaps simpler, option.
1

EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?

Perhaps a cleaner (and slightly more efficient) way:

import re
import itertools

allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)
expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]
print {x for x in itertools.chain.from_iterable(expanded)}

Explanations:

Match all strings like 'a-b' or 'a:' and return a list of (a, b) and (a, '') pairs respectively:

allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)

This produces:

[('1', '3'), ('6', ''), ('10', '11'), ('7', '9')]

Using list comprehension expand all pairs of (x, y) into the full list of numbers in the range (x, y + 1), taking care to handle the (x, '') case as (x, x+1):

expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]

This produces:

[[1, 2, 3], [6], [10, 11], [7, 8, 9]]

Use itertools.chain.from_iterable() to transform the list of lists into a single iterable which is iterated by a set comprehension into the final set:

print {x for x in itertools.chain.from_iterable(expanded)}

This produces:

set([1, 2, 3, 6, 7, 8, 9, 10, 11])

1 Comment

Thanks, FujiApple, this solution has also the advantage of returning a sorted list of numbers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.