0

So, I'm trying to capture this big string in Python but it is failing me. The regex I wrote works fine in regexr: http://regexr.com/3cmdc

But trying to using it in Python to capture the text returns None. This is the code:

pattern = "var initialData = (.*?);\\n"
match = re.search(pattern, source).group(1)

What am I missing ?

4
  • Are you sure the input has no linebreaks as in the regex demo? If you have linebreaks, you can try match = re.search(pattern, source, re.S).group(1), but if your string is very large, you might have an issue related to the lazy matching stack "overflow". Commented Jan 28, 2016 at 23:32
  • Try r'var initialData = ([^;]*(?:;(?!\\n)[^;]*)*)' pattern if your \n contains a literal \. Else, if \n is a normal linebreak, I'd advise r'var initialData = ([^;]*(?:;(?!\n)[^;]*)*)' Commented Jan 28, 2016 at 23:37
  • Consider paring down your example string. The problem is easily demonstrated with a much smaller example that could be added to the question directly. Commented Jan 29, 2016 at 0:06
  • There is only one line break in that sample string. Just why do you care about it though ? var initialData = (.*?); or var initialData = (.*?);\r?\n and get on with life. Or even var initialData = ([\S\s]*?); Commented Jan 29, 2016 at 0:16

2 Answers 2

1

You need to set the appropriate flags:

re.search(pattern, source, re.MULTILINE | re.DOTALL).group(1)
Sign up to request clarification or add additional context in comments.

3 Comments

Why use re.MULTILINE | re.DOTALL? OP does not use them at regexr but gets a match. And there are no ^/$ here, so re.M is totally redundant.
@WiktorStribiżew well, I suspect the newlines on regexr are not handled properly. Adding flags actually worked for me. Let's see if the answer is gonna help the OP. If not, I'll definitely remove the answer. Thanks.
Most probably the \n in the regex demo are real linebreaks, so re.S is OK to use, I believe. Although OP says the strings are huge, and I think lazy dot matching is very fragile here.
1

Use pythons raw string notation:

pattern = r"var initialData = (.*?);\\n"
match = re.search(pattern, source).group(1)

More information

1 Comment

It would be helpful to mention why... to match a literal \n in the text, the regex needs to escape the backslash to \\n but then python either needs a raw string r"\\n" or extra escaping of the backslashes "\\\\n".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.