Python Regex to find String between two strings

Question

I am trying to use Regex to look through a specific part of a string and take what is between but I cant get the right Regex pattern for this.

My biggest issue is with trying to form a Regex pattern for this. I've tried a bunch of variations close to the example listed. It should be close.

import re

toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text).lower())

# Gets rid of whitespace in case they move the []/[x] around
result = result.replace(" ", "")

if any(x in result for x in toFind):
    print("Exists")
else:
    print("Doesn't Exist")

Happy Path: I take string (text) and use Regex expression to get the substring between Link Created and Research Done.

Then make the result lowercase and get rid of whitespace just in case they move the []/[x]s. Then it looks at the string (result) for '[]' or '[x]' and print.

Actual Output: At the moment all I keep getting is None because the the Regex syntax is off...

joanis · Accepted Answer · 2019-07-10 19:33:54Z

1

If you want . to match newlines, you have the use the re.S option.

Also, it would seem a better idea to check if the regex matched before proceeding with further calls. Your call to lower() gave me an error because the regex didn't match, so calling result.group(0).lower() only when result evaluates as true is safer.

import re

toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text, re.S))

if result:
    # Gets rid of whitespace in case they move the []/[x] around
    result = result.group(0).lower().replace(" ", "")

    if any(x in result for x in toFind):
        print("Exists")
    else:
        print("Doesn't Exist")
else:
    print("re did not match")

PS: all the re options are documented in the re module documentation. Search for re.DOTALL for the details on re.S (they're synonyms). If you want to combine options, use bitwise OR. E.g., re.S|re.I will have . match newline and do case-insensitive matching.

edited Jul 10, 2019 at 19:33

answered Jul 10, 2019 at 19:20

joanis

13k23 gold badges38 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Brian Hartling Over a year ago

I knew I was pretty close~ Regex kills me every time.I'll definitely look at the documentation. Thanks.

Brian Hartling Over a year ago

No problem, I'm still new to the formalities of this site.

joanis Over a year ago

That's normal. I appreciate the vote of confidence. And I'll delete this comment and the one above soon since they're chatty and not really contributing to the long-term value of this question and answer pair. Another good site practices I've learned over time: delete comments that don't have lasting relevance.

TomNash · Accepted Answer · 2019-07-10 19:08:35Z

1

I believe it's the \n newline characters giving issues. You can get around this using [\s\S]+ as such:

import re

toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

# New regex to match text between
# Remove all newlines, tabs, whitespace and column separators
result = re.search(r"Link Created([\s\S]+)Research Done", text).group(1)
result = re.sub(r"[\n\t\s\|]*", "", result)

if any(x in result for x in toFind):
    print("Exists")
else:
    print("Doesn't Exist")

answered Jul 10, 2019 at 19:08

TomNash

3,3272 gold badges28 silver badges67 bronze badges

Comments

benvc · Accepted Answer · 2019-07-10 19:49:06Z

1

Seems like regex is overkill for this particular job unless I am missing something (also not clear to me why you need the step that removes the whitespace from the substring). You could just split on "Link Created" and then split the following string on "Research Done".

text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

s = text.split("Link Created")[1].split("Research Done")[0].lower()

if "[]" in s or "[x]" in s:
    print("Exists")
else:
    print("Doesn't Exist")

# Exists

answered Jul 10, 2019 at 19:49

benvc

15.3k4 gold badges39 silver badges57 bronze badges

1 Comment

Brian Hartling Over a year ago

I could see Regex being overkill I guess. This is actually part of a bigger program and I thought I'd remove white space since there is a chance that when people edit some code that will become the "text" string, they could~ accidentally add whitespace into the [] or [x].

Collectives™ on Stack Overflow

Python Regex to find String between two strings

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related