1

Todo: Use a regular expression to breakdown drives

drives = "8:20-24,30,31,32,10:20-24,30,31,32"

Final output will look like this:

formatted_drives = [{8: [20,21,22,23,24,30,31,32]}, {10: [20,21,22,23,24,30,31,32]}]

Here is what the regex currently looks like:

    regex_static_multiple_with_singles = re.match(r"""
    (?P<enc>\d{1,3}):       # Enclosure ID:
    (?P<start>\d+)          # Drive Start
    -                       # Range -
    (?P<end>\d+)            # Drive End
    (?P<singles>,\d+)+      # Drive Singles - todo resolve issue here
    """, drives, (re.IGNORECASE | re.VERBOSE))

and what is returned:

[DEBUG  ] All Drive Sequences: ['8:20-24,30,31,32', '10:20-24,30,31,32']
[DEBUG  ] Enclosure ID  : 8
[DEBUG  ] Drive Start   : 20
[DEBUG  ] Drive End     : 24
[DEBUG  ] Drive List    : [20, 21, 22, 23, 24]
[DEBUG  ] Drive Singles : ,32
[DEBUG  ] Enclosure ID  : 10
[DEBUG  ] Drive Start   : 20
[DEBUG  ] Drive End     : 24
[DEBUG  ] Drive List    : [20, 21, 22, 23, 24]
[DEBUG  ] Drive Singles : ,32

The issue is with drive singles only returning the last group. In this case there are 3x single drives, however, it is a variable quantity. What is the best method to return all single drives?

3
  • Use (?P<singles>(?:,\d+)+) and after getting a match, split that value with ,. Commented Nov 7, 2016 at 17:17
  • Are you considering the use of PyPi regex module? Commented Nov 7, 2016 at 19:15
  • Thanks, this is what I need. Just using 're' module. Commented Nov 7, 2016 at 20:09

1 Answer 1

1

Try this:

line = "8:20-24,30,31,32,10:21-24,30,31,32,15:11,12,13-14,16-18"
regex = r"(\d+):((?:\d+[-,]|\d+$)+)"

above regex will split each block based on : and we get 3 match:

  1. 8:20-24,30,31,32,
  2. 10:21-24,30,31,32,
  3. 15:11,12,13-14,16-18

regex 2 will split each match into segments

regex2 = r"\d+-\d+|\d+"

for match 1, the segments are:

 a)20-24
 b)30
 c)31
 d)32

Then the rest is simple and self explainatory in the following code:

#!/usr/bin/python
import re
regex = r"(\d+):((?:\d+[-,]|\d+$)+)"
line = "8:20-24,30,31,32,10:21-24,30,31,32,15:11,12,13-14,16-18"
regex2 = r"\d+-\d+|\d+"

d={}

matchObj = re.finditer(regex,line, re.MULTILINE)

for matchNum, match in enumerate(matchObj):
    #print (match.group(2))
    match2 = re.finditer(regex2,match.group(2))
    for matchNum1, m in enumerate(match2):
        key=int(match.group(1))
        if '-' in m.group():
            y = m.group().split('-')
            for i in xrange(int(y[0]),int(y[1])+1):
                if key in d:
                    d[key].append(i)
                else:
                    d[key] = [i,]
        else:
                if key in d:
                    d[key].append(int(m.group()))
                else:
                    d[key] = [int(m.group()),]          
print(d)    

run the code here

Sample output:

{8: [20, 21, 22, 23, 24, 30, 31, 32], 10: [21, 22, 23, 24, 30, 31, 32], 15: [11, 12, 13, 14, 16, 17, 18]}
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for this, it's very flexible!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.