0

I'm newbie in regular expression.

I'm trying to get list of service that are up or down in svstat command.

Example output of svstat:

/etc/service/worker-test-1: up (pid 1234) 97381 seconds
/etc/service/worker-test-2: up (pid 4567) 92233 seconds
/etc/service/worker-test-3: up (pid 8910) 97381 seconds
/etc/service/worker-test-4: down 9 seconds, normally up
/etc/service/worker-test-5: down 9 seconds, normally up
/etc/service/worker-test-6: down 9 seconds, normally up

So, currently I need 2 regex to filter service that are UP, or DOWN

Sample regex-1 for UP:

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)\s\(pid\s(?P<pid>\d+)\)\s(?P<seconds>\d+)

Output for regex-1:

Match 1
status -> up
service_name -> worker-test-1
pid -> 1234
seconds -> 97381

Match 2
status -> up
service_name -> worker-test-2
pid -> 4567
seconds -> 92233

Match 3
status -> up
service_name -> worker-test-3
pid -> 8910
seconds -> 97381

Sample regex-2 for DOWN

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)\s(?P<seconds>\d+)

Output for regex-2

Match 1
status -> down
service_name -> worker-test-4
seconds -> 9

Match 2
status -> down
service_name -> worker-test-5
seconds -> 9

Match 3
status -> down
service_name -> worker-test-6
seconds -> 9

Question is, how to use only 1 regex to get both UP and DOWN?

By the way, Im using http://pythex.org/ to create and test these regex.

6
  • Why do you want only one regex? It seems like you'd have to differentiate afterwards, anyway, so why not use different regex? Commented Jun 15, 2016 at 8:48
  • @tobias_k since I need to do some automation, I need to check whether the service is up, down or got restarted recently. Restart mean status is Up and seconds is less than certain threshold Commented Jun 15, 2016 at 8:52
  • Same question as @tobias_k plus maybe with such machine generated input it would be a clearer fit, to parse it based on fixed strings where the first decision would be on : down vs. : up and then in than process the subexpression domain specific. I understand also the isolated two regexes used neither too easily readable nor too maintainable ;-) Commented Jun 15, 2016 at 8:52
  • @Dilettant You mean I can simplify those regex? Commented Jun 15, 2016 at 8:57
  • 1
    Ok, added the non-regex suggestion - just so one can compare and because I said I would do so ;-). Commented Jun 15, 2016 at 9:41

3 Answers 3

1

You could enclose pid to optional non-capturing group:

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)(?:\s\(pid\s(?P<pid>\d+)\))?\s(?P<seconds>\d+)

This would result pid being None in case service is down. See Regex101 demo.

Sign up to request clarification or add additional context in comments.

1 Comment

ah its state in the pythex doc --> (?:...) non-capturing version of regular parentheses. thanks man
1

As promised here my lunchbreak alternative (do not want to talk into fixed token split parsing, but might come in handy when considering the rest of the use case that only the OP knows ;-)

#! /usr/bin/env python
from __future__ import print_function

d = """
/etc/service/worker-test-1: up (pid 1234) 97381 seconds
/etc/service/worker-test-2: up (pid 4567) 92233 seconds
/etc/service/worker-test-3: up (pid 8910) 97381 seconds
/etc/service/worker-test-4: down 9 seconds, normally up
/etc/service/worker-test-5: down 9 seconds, normally up
/etc/service/worker-test-6: down 9 seconds, normally up
"""


def service_state_parser_gen(text_lines):
    """Parse the lines from service monitor by splitting
    on well known binary condition (either up or down)
    and parse the rest of the fields based on fixed
    position split on sanitized data (in the up case).
    yield tuple of key and dictionary as result or of
    None, None when neihter up nor down detected."""

    token_up = ': up '
    token_down = ': down '
    path_sep = '/'

    for line in d.split('\n'):
        if token_up in line:
            chunks = line.split(token_up)
            status = token_up.strip(': ')
            service = chunks[0].split(path_sep)[-1]
            _, pid, seconds, _ = chunks[1].replace(
                '(', '').replace(')', '').split()
            yield service, {'name': service,
                            'status': status,
                            'pid': int(pid),
                            'seconds': int(seconds)}
        elif token_down in line:
            chunks = line.split(token_down)
            status = token_down.strip(': ')
            service = chunks[0].split(path_sep)[-1]
            pid = None
            seconds, _, _, _ = chunks[1].split()
            yield service, {'name': service,
                            'status': status,
                            'pid': None,
                            'seconds': int(seconds)}
        else:
            yield None, None


def main():
    """Sample driver for parser generator function."""

    services = {}
    for key, status_map in service_state_parser_gen(d):
        if key is None:
            print("Non-Status line ignored.")
        else:
            services[key] = status_map

    print(services)

if __name__ == '__main__':
    main()

When being run it yields as result on the given sample input:

Non-Status line ignored.
Non-Status line ignored.
{'worker-test-1': {'status': 'up', 'seconds': 97381, 'pid': 1234, 'name': 'worker-test-1'}, 'worker-test-3': {'status': 'up', 'seconds': 97381, 'pid': 8910, 'name': 'worker-test-3'}, 'worker-test-2': {'status': 'up', 'seconds': 92233, 'pid': 4567, 'name': 'worker-test-2'}, 'worker-test-5': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-5'}, 'worker-test-4': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-4'}, 'worker-test-6': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-6'}}

So the otherwise in named group matches stored info is stored (already type converted as values under matching keys in a dict. If a service is down, there is of course no process id, thus pid is mapped to None which makes it easy to code in a robust manner against it (if one would store all down services in a separate structure that would be implicit, that no access to apidfield is advisable ...

Hope it helps. PS: Yes, the argument name text_lines of the showcase function is not optimally named, for what it contains, but you should get the parsing idea.

2 Comments

Just to show you the performance: 1000 loops, best of 3: 580 µs per loop vs 1000 loops, best of 3: 1.34 ms per loop
@skycrew: Is this comparing regex versus spit implementation? Maybe an average would be more practical (or a worst case) in theory, when in practice with these fixed strings coming in, all bench instances should be well conditioned. Shall I update the answer with timeit evaluations on both strategies? Can do, no problem ...
-1

I don't know if you are forced to use regex at all but if you don't have to, you can do something like this:

if "down" in linetext:
    print( "is down" )
else:
    print( "is up" )

Easier to read and faster as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.