Python Regular Expression Multiple Groups

Question

Todo: Use a regular expression to breakdown drives

drives = "8:20-24,30,31,32,10:20-24,30,31,32"

Final output will look like this:

formatted_drives = [{8: [20,21,22,23,24,30,31,32]}, {10: [20,21,22,23,24,30,31,32]}]

Here is what the regex currently looks like:

    regex_static_multiple_with_singles = re.match(r"""
    (?P<enc>\d{1,3}):       # Enclosure ID:
    (?P<start>\d+)          # Drive Start
    -                       # Range -
    (?P<end>\d+)            # Drive End
    (?P<singles>,\d+)+      # Drive Singles - todo resolve issue here
    """, drives, (re.IGNORECASE | re.VERBOSE))

and what is returned:

[DEBUG  ] All Drive Sequences: ['8:20-24,30,31,32', '10:20-24,30,31,32']
[DEBUG  ] Enclosure ID  : 8
[DEBUG  ] Drive Start   : 20
[DEBUG  ] Drive End     : 24
[DEBUG  ] Drive List    : [20, 21, 22, 23, 24]
[DEBUG  ] Drive Singles : ,32
[DEBUG  ] Enclosure ID  : 10
[DEBUG  ] Drive Start   : 20
[DEBUG  ] Drive End     : 24
[DEBUG  ] Drive List    : [20, 21, 22, 23, 24]
[DEBUG  ] Drive Singles : ,32

The issue is with drive singles only returning the last group. In this case there are 3x single drives, however, it is a variable quantity. What is the best method to return all single drives?

Use (?P<singles>(?:,\d+)+) and after getting a match, split that value with ,. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 7, 2016 at 17:17

Mustofa Rizwan · Accepted Answer · 2016-11-07 20:20:36Z

1

Try this:

line = "8:20-24,30,31,32,10:21-24,30,31,32,15:11,12,13-14,16-18"
regex = r"(\d+):((?:\d+[-,]|\d+$)+)"

above regex will split each block based on : and we get 3 match:

8:20-24,30,31,32,
10:21-24,30,31,32,
15:11,12,13-14,16-18

regex 2 will split each match into segments

regex2 = r"\d+-\d+|\d+"

for match 1, the segments are:

 a)20-24
 b)30
 c)31
 d)32

Then the rest is simple and self explainatory in the following code:

#!/usr/bin/python
import re
regex = r"(\d+):((?:\d+[-,]|\d+$)+)"
line = "8:20-24,30,31,32,10:21-24,30,31,32,15:11,12,13-14,16-18"
regex2 = r"\d+-\d+|\d+"

d={}

matchObj = re.finditer(regex,line, re.MULTILINE)

for matchNum, match in enumerate(matchObj):
    #print (match.group(2))
    match2 = re.finditer(regex2,match.group(2))
    for matchNum1, m in enumerate(match2):
        key=int(match.group(1))
        if '-' in m.group():
            y = m.group().split('-')
            for i in xrange(int(y[0]),int(y[1])+1):
                if key in d:
                    d[key].append(i)
                else:
                    d[key] = [i,]
        else:
                if key in d:
                    d[key].append(int(m.group()))
                else:
                    d[key] = [int(m.group()),]          
print(d)

run the code here

Sample output:

{8: [20, 21, 22, 23, 24, 30, 31, 32], 10: [21, 22, 23, 24, 30, 31, 32], 15: [11, 12, 13, 14, 16, 17, 18]}

answered Nov 7, 2016 at 20:20

Mustofa Rizwan

10.5k2 gold badges30 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JT1 Over a year ago

Thank you for this, it's very flexible!

Collectives™ on Stack Overflow

Python Regular Expression Multiple Groups

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related