Python regex capture issue

Question

I have a regex expression that captures the text from a text file. The regex can ve viewed at the following URL: https://regex101.com/r/wwEjTF/1

In my python code I would like extract the text that is matched by the regex only from all the other text in the text file. I have following python code for matching th regex and storing it in a variable.

match = re.findall(r'test\s.+\n\sdescription\s\"(.+)\"', text, re.S)

I am expecting all the matches to be in the match variable and returned a list. But when I do print (match) I get empty list. I do not understand why it is coming up an empty list. How do i capture the matched part of the regex into the variable. Thanks for your help. Just in case if there is an issue with the above url, is the regex and the sample text string:

test\s.+\n\sdescription\s\"(.+)\"

some random text
test 111.333.555.666
  description "text10"
some random text
some random text
test 22.44.55.66
  description "text12"
some random text
some random text
test 77.77.88.99
  description "text13"
some random text
some random text
test 14.22.55.99
  description "text16"
some random text
some random text
test 13.33.55.66
  description "text17"
some random text`

In the beginning of the indented line - are those spaces or tabs? Python editors tend to substitute tabs with 4 spaces. — Kendas
– Kendas, Commented Jun 6, 2017 at 11:09
Try omitting that last re.S. Because re.S Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline. And I think your .+ is consuming everything. — Rahul
– Rahul, Commented Jun 6, 2017 at 11:09
Are you sure there's a single whitespace character in front of your description? Try: re.findall(r'test\s.+\n\s+description\s"(.+)"', text, re.S) — zwer
– zwer, Commented Jun 6, 2017 at 11:14

Rahul · Accepted Answer · 2017-06-06 11:24:32Z

1

Like I said in my comment try omitting re.S because It makes the '.' special character match any character at all, including a newline

Also \n\s is not appropriate. As \s includes newline. You will have to use \s+

Your regex will be:

match = re.findall(r'test\s.+\s+description\s\"(.+)\"', text)

Ideone Demo

answered Jun 6, 2017 at 11:24

Rahul

2,76616 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

grundic · Accepted Answer · 2017-06-06 11:21:05Z

0

Example in the regex101 uses tab as separator for description, so single \s works for it. Replace it with repetition:

match = re.findall(r'test\s.+\n\s+description\s\"(.+)\"', text, re.S)

answered Jun 6, 2017 at 11:21

grundic

5,0074 gold badges35 silver badges51 bronze badges

2 Comments

Rahul Over a year ago

Even with this only last group will be captured because re.S will make .+ consume everything till end.

grundic Over a year ago

The regex is not perfect, I've just pointed topic starter what was wrong. Other improvements, like one you mentioned or removing .+ definitely could applied.

Collectives™ on Stack Overflow

Python regex capture issue

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related