1

I have a regex expression that captures the text from a text file. The regex can ve viewed at the following URL: https://regex101.com/r/wwEjTF/1

In my python code I would like extract the text that is matched by the regex only from all the other text in the text file. I have following python code for matching th regex and storing it in a variable.

match = re.findall(r'test\s.+\n\sdescription\s\"(.+)\"', text, re.S)

I am expecting all the matches to be in the match variable and returned a list. But when I do print (match) I get empty list. I do not understand why it is coming up an empty list. How do i capture the matched part of the regex into the variable. Thanks for your help. Just in case if there is an issue with the above url, is the regex and the sample text string:

test\s.+\n\sdescription\s\"(.+)\"
some random text
test 111.333.555.666
  description "text10"
some random text
some random text
test 22.44.55.66
  description "text12"
some random text
some random text
test 77.77.88.99
  description "text13"
some random text
some random text
test 14.22.55.99
  description "text16"
some random text
some random text
test 13.33.55.66
  description "text17"
some random text`
5
  • In the beginning of the indented line - are those spaces or tabs? Python editors tend to substitute tabs with 4 spaces. Commented Jun 6, 2017 at 11:09
  • Try omitting that last re.S. Because re.S Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline. And I think your .+ is consuming everything. Commented Jun 6, 2017 at 11:09
  • Yes I have ommted the re.S and still the same issue. Commented Jun 6, 2017 at 11:11
  • 1
    Are you sure there's a single whitespace character in front of your description? Try: re.findall(r'test\s.+\n\s+description\s"(.+)"', text, re.S) Commented Jun 6, 2017 at 11:14
  • What's your expected output ? Commented Jun 6, 2017 at 11:20

2 Answers 2

1

Like I said in my comment try omitting re.S because It makes the '.' special character match any character at all, including a newline

Also \n\s is not appropriate. As \s includes newline. You will have to use \s+

Your regex will be:

match = re.findall(r'test\s.+\s+description\s\"(.+)\"', text)

Ideone Demo

Sign up to request clarification or add additional context in comments.

Comments

0

Example in the regex101 uses tab as separator for description, so single \s works for it. Replace it with repetition:

match = re.findall(r'test\s.+\n\s+description\s\"(.+)\"', text, re.S)

2 Comments

Even with this only last group will be captured because re.S will make .+ consume everything till end.
The regex is not perfect, I've just pointed topic starter what was wrong. Other improvements, like one you mentioned or removing .+ definitely could applied.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.