2

I am trying to use Regex to look through a specific part of a string and take what is between but I cant get the right Regex pattern for this.

My biggest issue is with trying to form a Regex pattern for this. I've tried a bunch of variations close to the example listed. It should be close.

import re

toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text).lower())

# Gets rid of whitespace in case they move the []/[x] around
result = result.replace(" ", "")

if any(x in result for x in toFind):
    print("Exists")
else:
    print("Doesn't Exist")

Happy Path: I take string (text) and use Regex expression to get the substring between Link Created and Research Done.

Then make the result lowercase and get rid of whitespace just in case they move the []/[x]s. Then it looks at the string (result) for '[]' or '[x]' and print.

Actual Output: At the moment all I keep getting is None because the the Regex syntax is off...

3 Answers 3

1

If you want . to match newlines, you have the use the re.S option.

Also, it would seem a better idea to check if the regex matched before proceeding with further calls. Your call to lower() gave me an error because the regex didn't match, so calling result.group(0).lower() only when result evaluates as true is safer.

import re

toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text, re.S))

if result:
    # Gets rid of whitespace in case they move the []/[x] around
    result = result.group(0).lower().replace(" ", "")

    if any(x in result for x in toFind):
        print("Exists")
    else:
        print("Doesn't Exist")
else:
    print("re did not match")

PS: all the re options are documented in the re module documentation. Search for re.DOTALL for the details on re.S (they're synonyms). If you want to combine options, use bitwise OR. E.g., re.S|re.I will have . match newline and do case-insensitive matching.

Sign up to request clarification or add additional context in comments.

3 Comments

I knew I was pretty close~ Regex kills me every time.I'll definitely look at the documentation. Thanks.
No problem, I'm still new to the formalities of this site.
That's normal. I appreciate the vote of confidence. And I'll delete this comment and the one above soon since they're chatty and not really contributing to the long-term value of this question and answer pair. Another good site practices I've learned over time: delete comments that don't have lasting relevance.
1

I believe it's the \n newline characters giving issues. You can get around this using [\s\S]+ as such:

import re

toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

# New regex to match text between
# Remove all newlines, tabs, whitespace and column separators
result = re.search(r"Link Created([\s\S]+)Research Done", text).group(1)
result = re.sub(r"[\n\t\s\|]*", "", result)

if any(x in result for x in toFind):
    print("Exists")
else:
    print("Doesn't Exist")

Comments

1

Seems like regex is overkill for this particular job unless I am missing something (also not clear to me why you need the step that removes the whitespace from the substring). You could just split on "Link Created" and then split the following string on "Research Done".

text = "| Completed?|\n|------|:---------:|\n|Link Created    |   []   |\n|Research Done   |   [X] "

s = text.split("Link Created")[1].split("Research Done")[0].lower()

if "[]" in s or "[x]" in s:
    print("Exists")
else:
    print("Doesn't Exist")

# Exists

1 Comment

I could see Regex being overkill I guess. This is actually part of a bigger program and I thought I'd remove white space since there is a chance that when people edit some code that will become the "text" string, they could~ accidentally add whitespace into the [] or [x].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.