3

I have the following script tha that gets the service_name of a tnsfiles if available if not it get the SID it seems to work fine but it is returning me tuples that I am unable to parse

#!/usr/bin/env python

import re

regexes = re.compile(r'SERVICE_NAME\s?=\s?(.+?)\)|SID\s?=\s?(.+?)\)')

with open('tnsnames.ora.test') as tns_file:
    for tnsname in tns_file:
        match = regexes.search(tnsname)

        if match:
          print(match.groups())

the script returns the following:

(None, 'db1')
('db2', None)
('db3', None)

but I only want to have the name of the db returned not the None

how can I strip the "None" from the output. i cannot use re.findall because there are some lines in the tnsnames that have a service_name and a sid and then I will have duplicates.

how can I parse the output of match regex object to ignore the none?

2 Answers 2

1

You are using .groups() method that returns all captured values even if they are empty. Since the regex contains an alternation with a capturing group in each, one of them will always be empty upon a valid match.

A generic solution for this is to filter out a None value from the two item tuple, and you may do that using a lot of approaches. One way is to concat the two values:

m = match.groups()
print(r'{}{}'.format(m[0] or '', m[1] or ''))

The m[x] or '' syntax is OK here as we can only have a string or None in the match.groups().

Another solution is to re-write the pattern so that it contains just 1 capturing group.

It is easy to make the pattern contain a single group as the part matching between parentheses is duplicated in both alternatives:

r'(?:SERVICE_NAME|SID)\s*=\s*([^)\r\n]+)'
  ^^^^^^^^^^^^^^^^^^^^

See the regex demo and the regex graph:

enter image description here

Details

  • (?:SERVICE_NAME|SID) - a non-capturing group that matches either SERVICE_NAME or SID
  • \s*=\s* - a = enclosed with 0+ whitespaces
  • ([^)\r\n]+) - Group 1: any chars, one or more occurrences, other than ), CR and LF (excluded because of . in the original attempt).
Sign up to request clarification or add additional context in comments.

Comments

0

If you want a single capturing group to prevent getting 2 groups where one will be empty due to the alternation, you could move the alternation to the start of the pattern between SERVICE_NAME and SID (?:SERVICE_NAME|SID) and make it a non capturing group.

If both words can not be part of a larger word, you could prepend a wordboundary \b to the pattern.

(?:SERVICE_NAME|SID)\s?=\s?(.+?)\)

Explanation

  • (?:SERVICE_NAME|SID) Match either SERVICE_NAME or SID
  • \s?=\s? Match a = surrounded by an optional whitespace char
  • (.+?)\) Match any character except a newline non greedy, then match )

Regex demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.