access regex capturing groups in python

Question

ptx captures most of what i want. Because i am incompetent at combining many things into one regex) i created a second ptx1 regex that should capture the following character sequences ADDITIONALLY: One Department, One foreign Department, Two office

    text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
    text_list = ' '.join(map(str, text_list))
    ptx = re.compile(r'(\s+something(?:\s+|\\n)*patternx:)(.*)(One\s+foreign)', flags = re.DOTALL)
    ten = ptx.search(text_list)
    try:
        if ten:
            ten = ten.group(2)
        else:
            ten = None
    except:
        pass

My question is: What do i need to do in order to get the (.*) or text_i_want content returned? I have the gut feeling that i need to access the eleven as if it were a list because it has so many capturing groups by eleven[0].group(1) in order to get first element from the list and get its second group. But that didnt work either.

You can think of text_list like this

text_list = ['...something\npatternx: text_i_want One Department',
'...something patternx: text_i_want One foreign Department',
'...something\n patternx: text_i_want Two office']

Update

    text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
    text_list = ' '.join(map(str, text_list))
    ptx = re.compile(r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b', flags = re.DOTALL)
    ten = ptx.search(text_list)
    try:
        if ten:
            ten = ten.group(2)
        else:
            ten = None
    except:
        pass

Wiktor Stribiżew · Accepted Answer · 2021-10-03 18:25:12Z

It looks as if you got tricked when factoring in the alternatives on the right hand side.

You need to use

\bsomething\s+patternx:(.*?)\b(?:One\s+foreign|One\s+Department|One\s+foreign\s+Department|Two\s+office)\b

which can be shortened as

\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b

See the regex demo. Details:

\bsomething\s+patternx: - whole word something, one or more whitespaces, patternx: string
(.*?) - Group 1: any zero or more chars as few as possible
\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b - either One Department, One foreign, One foreign Department, or Two office as whole words.

See the Python demo:

import re
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
rx = r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b'
print(re.findall(rx, text_list, re.DOTALL))
# => [' text_i_want ', ' text_i_want ', ' text_i_want ']

Collectives™ on Stack Overflow

access regex capturing groups in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related