1

My regex syntax is not returning the correct results. I have data returned from GitHub using the github3.py library that returns three possible strings when parsing through the patch key of md files (https://developer.github.com/v3/pulls/#list-pull-requests-files). I've read the regex documentation and several threads, but I'm missing something in my syntax.

string1 = '> [HELP.SELECTOR]'
string2 = '-> [HELP.SELECTOR]'
string3 = '+> [HELP.SELECTOR]'

I want to print True for the exact match to string2 or string3 and False if string1 is found. My results are returning False if string2 or string3 is found.

for prs in repo.pull_requests():
    search_string_found = 'False'
    regex_search_string1 = re.compile(r"^\+>\s\[HELP.SELECTOR\]")
    regex_search_string2 = re.compile(r"^->\s\[HELP.SELECTOR\]")
    for data in repo.pull_request(prs.number).files():
        match_text1 = regex_search_string1.search(data.patch)
        match_text2 = regex_search_string2.search(data.patch)                        
        if match_text1 is not None and match_text2 is not None:
            search_string_found = 'True'
            break

    print('HELP.SELECTOR present in file: ', search_string_found)
4
  • 1
    Just test against one regex: regex_search_string = re.compile(r"^[+-]>\s\[HELP\.SELECTOR\]"), then: if regex_search_string.search(data.patch): Commented May 13, 2016 at 21:38
  • Your solution worked. I tweaked the regex by removing the caret and the correct results were returned. regex_search_string = re.compile(r"[\+-]>\s\[HELP\.SELECTOR\]"). Commented May 17, 2016 at 18:28
  • That means the strings you needed were not at the start of the string, right? Commented May 17, 2016 at 19:38
  • [+/-] is at the start of the string, such as +> [HELP.SELECTOR], but when I tested with the caret, incorrect results were coming back, but when I removed it, results were as expected. Commented May 17, 2016 at 20:45

2 Answers 2

1

Since you confirm your strings may be not located at the string start, you need

regex_search_string = re.compile(r"[+-]>\s\[HELP\.SELECTOR\]")
for data in repo.pull_request(prs.number).files():
    match_text = regex_search_string.search(data.patch)
    if match_text:
        search_string_found = 'True'
        break

Note:

  • [+-] matches either a + or a - since it is a character class that matches a single character from a range/set specified inside it
  • + inside [...] does not have to be escaped ever
  • - at the start or end of [...] does not have to be escaped
  • re.search returns a match data object or None, you need to check it first before accessing the text matched/captured
Sign up to request clarification or add additional context in comments.

Comments

0

It is easier to maintain one regex string than several. Try this:

import re

strings = [
     '> [HELP.SELECTOR]$',
     '-> [HELP.SELECTOR]$',
     '+> [HELP.SELECTOR]$',
]

for string in strings:
    print (bool(re.match(r'[-+]> \[HELP.SELECTOR\]$', string)), string)

Result:

False > [HELP.SELECTOR]
True -> [HELP.SELECTOR]
True +> [HELP.SELECTOR]

Applying that to your problem,

#UNTESTED
for prs in repo.pull_requests():
    search_string_found = any(
        re.match(r'[-+]> \[HELP.SELECTOR\]', data.patch)
        for data in repo.pull_request(prs.number).files())
    print('HELP.SELECTOR present in file: ', search_string_found)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.