2

I need to parse the following output :-

|------------------------------|-----------------|----------------------------------------|--------------------|------------|
| Assembly name                | User name       | Path                                   | Start Time         | State      |
|----------127.0.0.1-----------|-----------------|------Shell version 1.2.1-13-09-27------|--------------------|------------|
|ng40core2                     |ng40             |/home/regress/ng40core2                 |2013-10-07 16:55:52 |Running     |
|ng40core1                     |ng40             |/home/regress/ng40core1                 |2013-10-07 16:53:54 |Running     |
|------------------------------|-----------------|----------------------------------------|--------------------|------------|

There can be multiple entries with different versions of ng40core in this output.

I have written regex for single line,

regex_list = ['\s*',
'\S+\s*',
'\S+\s+Assembly\s+name\s+\S+\s+User\s+name\s+\S+\s+Path\s+\S+\s+Start\s+Time\s+\S+\s+State\s+\S+\s*',
'\|\S+\d+\.\d+\.\d+\.\d+\S+Shell\s+version\s+.*\s*',
'\|(?P<ng40core_instance>\S+)\s+\|(?P<user_name>\S+)\s+\|(?P<path>\S+)\s+\|(?P<start_time>\d+\-\d+\-\d+\s+\d+:\d+:\d+)\s+\|(?P<state>\w+)\s+\|\s*']

I want to get multiple values for a single key.
For "ng40core2" - I need username,path,start-time and state
Same way for "ng40core1" - I need username,path,start-time and state.

It will be really helpful if you can suggest a way to achieve this.

1
  • That is fine, ultimately the need is of multiple values for a single key (for all the lines in output). Commented Oct 8, 2013 at 10:13

2 Answers 2

4

You don't need to parse with regex.

Your text:

s = """
|------------------------------|-----------------|----------------------------------------|--------------------|------------|
| Assembly name                | User name       | Path                                   | Start Time         | State      |
|----------127.0.0.1-----------|-----------------|------Shell version 1.2.1-13-09-27------|--------------------|------------|
|ng40core2                     |ng40             |/home/regress/ng40core2                 |2013-10-07 16:55:52 |Running     |
|ng40core1                     |ng40             |/home/regress/ng40core1                 |2013-10-07 16:53:54 |Running     |
|------------------------------|-----------------|----------------------------------------|--------------------|------------|
"""

The code:

for line in s.splitlines():
    line = [x for x in line.split('|') if x]
    if line and line[0].startswith('ng'):
        line = [x.strip() for x in line] # cleanup whitespace
        assembly_name, user_name, path, start_date, state = line
        print assembly_name, user_name, path, start_date, state

The result:

>>> 
ng40core2  ng40  /home/regress/ng40core2  2013-10-07 16:55:52  Running 
ng40core1  ng40  /home/regress/ng40core1  2013-10-07 16:53:54  Running 

Just for fun, I made a more robust function:

def retrieve(file_path):
    with open(file_path) as f:
        for assembly_name, user_name, path, start_date, state in parse(f.read()):
            # code
            print assembly_name, user_name, path, start_date, state # example

def parse(text):
    for line in text.splitlines():
        line = [x for x in line.split('|') if x]
        if line and line[0].startswith('ng'):
            yield [x.strip() for x in line]
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Inbar Rose, that was real quick and compact solution of my problem!
Note that there are still some spaces around the edges. See print line or print repr(path). You can just use line = [x.strip() for x in line] instead.
2

You may use re.findall() with regex for desired line

print re.findall(r'\|(?P<ng40core_instance>\S+)\s+\|(?P<user_name>\S+)\s+\|(?P<path>\S+)\s+\|(?P<start_time>\d+\-\d+\-\d+\s+\d+:\d+:\d+)\s+\|(?P<state>\w+)\s+\|\s*', text)

Output:

[('ng40core2', 'ng40', '/home/regress/ng40core2', '2013-10-07 16:55:52', 'Running'), ('ng40core1', 'ng40', '/home/regress/ng40core1', '2013-10-07 16:53:54', 'Running')]

1 Comment

Credit to @Haidro for pointing out re.findall() in another question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.