2

ptx captures most of what i want. Because i am incompetent at combining many things into one regex) i created a second ptx1 regex that should capture the following character sequences ADDITIONALLY: One Department, One foreign Department, Two office

    text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
    text_list = ' '.join(map(str, text_list))
    ptx = re.compile(r'(\s+something(?:\s+|\\n)*patternx:)(.*)(One\s+foreign)', flags = re.DOTALL)
    ten = ptx.search(text_list)
    try:
        if ten:
            ten = ten.group(2)
        else:
            ten = None
    except:
        pass

My question is: What do i need to do in order to get the (.*) or text_i_want content returned? I have the gut feeling that i need to access the eleven as if it were a list because it has so many capturing groups by eleven[0].group(1) in order to get first element from the list and get its second group. But that didnt work either.

You can think of text_list like this

text_list = ['...something\npatternx: text_i_want One Department',
'...something patternx: text_i_want One foreign Department',
'...something\n patternx: text_i_want Two office']

Update

    text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
    text_list = ' '.join(map(str, text_list))
    ptx = re.compile(r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b', flags = re.DOTALL)
    ten = ptx.search(text_list)
    try:
        if ten:
            ten = ten.group(2)
        else:
            ten = None
    except:
        pass
0

1 Answer 1

1

It looks as if you got tricked when factoring in the alternatives on the right hand side.

You need to use

\bsomething\s+patternx:(.*?)\b(?:One\s+foreign|One\s+Department|One\s+foreign\s+Department|Two\s+office)\b

which can be shortened as

\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b

See the regex demo. Details:

  • \bsomething\s+patternx: - whole word something, one or more whitespaces, patternx: string
  • (.*?) - Group 1: any zero or more chars as few as possible
  • \b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b - either One Department, One foreign, One foreign Department, or Two office as whole words.

See the Python demo:

import re
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
rx = r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b'
print(re.findall(rx, text_list, re.DOTALL))
# => [' text_i_want ', ' text_i_want ', ' text_i_want '] 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.