1

I search for hours but i can't find the correct regulare expression to match a simple pattern. With this text (it's the stdout of a logical volume by Volume group's listing):

rootvg:
hd5                 boot       1     1     1    closed/syncd  N/A
hd4                 jfs        38    38    1    open/syncd    /
datavg:
data01lv            jfs        7     7     1    open/syncd    /data1
data02lv            jfs        7     7     1    open/syncd    /data2

I wish to find that kind of results from my regulare expression (with regex.findall(text), for exemple):

    [(u'rootvg', u'hd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n'),(u'datavg', u'data01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2')]

But the best i can have is with this pattern:^(?P<vgname>\w+):\s(?P<lv>[\w\s\.\_\/-]+)+ results with findall:

[(u'rootvg', u'hd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\ndatavg')]
2
  • 3
    Why don't you use split() here instead of tinkering with fragile regular expressions? Commented Dec 19, 2012 at 18:23
  • you are right! that 's do the job! Commented Dec 19, 2012 at 18:57

2 Answers 2

4

Try the following:

re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)

Example:

>>> text = '''rootvg:
... hd5                 boot       1     1     1    closed/syncd  N/A
... hd4                 jfs        38    38    1    open/syncd    /
... datavg:
... data01lv            jfs        7     7     1    open/syncd    /data1
... data02lv            jfs        7     7     1    open/syncd    /data2'''
>>> re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)
[('rootvg', '\nhd5                 boot       1     1     1    closed/syncd  N/A\nhd4                 jfs        38    38    1    open/syncd    /\n'), ('datavg', '\ndata01lv            jfs        7     7     1    open/syncd    /data1\ndata02lv            jfs        7     7     1    open/syncd    /data2')]

The re.DOTALL flag makes it so . can match line break characters, and the re.MULTILINE flags makes it so ^ and $ can match at the beginning and end of lines, respectively, instead of just the beginning and end of the string.

Explanation:

^            # match at the start of a line
(\w+)        # match one or more letters or numbers and capture in group 1
:            # match a literal ':'
(.*?)        # match zero or more characters, as few as possible
(?=          # start lookahead (only match if following regex can match)
   ^\w+:       # start of line followed by word characters then ':'
   |           # OR
   \Z          # end of the string
)            # end lookahead

Alternatively, you could use re.split() with a much simpler regex to get similar output, it shouldn't be too difficult to transform this into the format you need:

>>> re.split(r'^(\w+):', text, flags=re.MULTILINE)
['', 'rootvg', '\nhd5                 boot       1     1     1    closed/syncd  N/A\nhd4                 jfs        38    38    1    open/syncd    /\n', 'datavg', '\ndata01lv            jfs        7     7     1    open/syncd    /data1\ndata02lv            jfs        7     7     1    open/syncd    /data2']

Here is how you might turn this into your desired format:

>>> matches = re.split(r'^(\w+):', text, flags=re.MULTILINE)
>>> [(v, matches[i+1]) for i, v in enumerate(matches) if i % 2]
[('rootvg', '\nhd5                 boot       1     1     1    closed/syncd  N/A\nhd4                 jfs        38    38    1    open/syncd    /\n'), ('datavg', '\ndata01lv            jfs        7     7     1    open/syncd    /data1\ndata02lv            jfs        7     7     1    open/syncd    /data2')]
Sign up to request clarification or add additional context in comments.

Comments

0
#!/usr/bin/env python

"""
    Demo code for Stackoverflow question:
    http://stackoverflow.com/questions/13958548/unable-to-find-the-correct-regex-in-python#13958634
"""

import StringIO

text = """
rootvg:
hd5                 boot       1     1     1    closed/syncd  N/A
hd4                 jfs        38    38    1    open/syncd    /
datavg:
data01lv            jfs        7     7     1    open/syncd    /data1
data02lv            jfs        7     7     1    open/syncd    /data2
"""


def gen_lines(text):    
    """ yield non-blank lines in input """
    for line in text:
        if line.strip():
            yield line

def gen_groups(text):
    group = None
    data = []
    for line in gen_lines(text):

        # We found a new group label
        if len(line.split()) == 1 and line.strip().endswith(':'):
            if group:
                yield group, data
            group = line.strip()[:-1]
            data = []

        # We found a data line
        elif group:
            data.append(line.split())

    # We're done with input; yield final group
    else:
        if group:
            yield group, data

def main():

    # Mimics behavior of mock_file = open('input.txt')
    mock_file = StringIO.StringIO(text)

    for group, data in gen_groups(mock_file):
        print group
        for d in data:
            print d

main() 

And the output:

rootvg
['hd5', 'boot', '1', '1', '1', 'closed/syncd', 'N/A']
['hd4', 'jfs', '38', '38', '1', 'open/syncd', '/']
datavg
['data01lv', 'jfs', '7', '7', '1', 'open/syncd', '/data1']
['data02lv', 'jfs', '7', '7', '1', 'open/syncd', '/data2']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.