3

I've been trying parse a value in a block.

Let me explain with an example.

I have the following text :

started xx xxxxxxx xxxxx xxxxxx xx xxxxxxxxx xxxxxxx xxxx xx
xx xxx xxxxx xxxx xxxxxxxx xxxx xxxxxx found 9999 xxxxx xxxxx
xxx xx xxxx xxxx xxxxxxxxxxx xxxxxxx xxx stored 9999 finished

I'm trying to catch the value between "started" and "finished"

I tried something like this

(?<block>started(.|\n)*finished)

but I don't know how to add the value \d+ near "stored"?

3
  • Does this answer your question? How to match "anything up until this sequence of characters" in a regular expression? Commented Nov 21, 2019 at 9:29
  • 2
    The regex does not work with Python re, (?<block> must be (?P<block>. Do not use (.|\n)*, use .*? with re.DOTALL. If you need to captured the digits try re.findall(r'started(.*?(?:stored\s+(\d+)\s+)?)finished', text, re.S) Commented Nov 21, 2019 at 9:30
  • re.match("started .+?found (\d+) .+? stored (\d+) finished", flags=re.DOTALL) Commented Nov 21, 2019 at 9:35

1 Answer 1

2

The regex you provided does not work with Python re, as (?<block>...) is not a supported named group syntax, it must look like (?P<block>...).

Also, it is recommended to avoid (.|\n)* that is a very inefficient construct, use .*? with re.DOTALL/re.S or (?s) instead.

If you need to captured the digits alongside the digits after stored and before finished (and if this is optional) use

re.findall(r'started(.*?(?:stored\s+(\d+)\s+)?)finished', text, re.S)

See the regex demo

Details

  • started - left-hand delimiter
  • (.*?(?:stored\s+(\d+)\s+)?) - Gropup 1:
    • .*? - any 0+ chars, as few as possible
    • (?:stored\s+(\d+)\s+)? - an optional group matching
      • stored\s+ - stored and 1+ whitespaces
      • (\d+) - Group 2: one or more digits
      • \s+ - 1+ whitespaces
  • finished - right-hand delimiter.
Sign up to request clarification or add additional context in comments.

6 Comments

@IgorShilov As usual, with open(file + ".out.txt", 'w') as fw: fw.write(updated_contents)
@IgorShilov I have no idea what you mean. Please add the non-working code to the question and explain the expected behavior.
in my case regex need for current log file more 15 mb if i use this code it didin't work with ⁣ ⁣open ( 'log.log) as reading: ⁣ ⁣ ⁣ ⁣line=reading.read() ⁣ ⁣ ⁣(r'\d+.*07.*?started\s+\w+(.*?(?:stored\s+(\d+)\s+)?)finished.\w+', line, re.DOTALL) ⁣ ⁣ ⁣ ⁣if regex: ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣for res1, res2 in regex: ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣print res2
it works only ide if you try make it for work with file you need construct "with open (file, r): as read" then you should read file by line - "readlines()" or full - "read()" , which method i shoud use and how write corret code in this case. see: ideone.com/zK2k0g
@IgorShilov You seem to be using Python2. Apart from this, everything looks good, reading.read() is the right method since it reads all file contents into a single variable.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.