regex in regex block

Question

I've been trying parse a value in a block.

Let me explain with an example.

I have the following text :

started xx xxxxxxx xxxxx xxxxxx xx xxxxxxxxx xxxxxxx xxxx xx
xx xxx xxxxx xxxx xxxxxxxx xxxx xxxxxx found 9999 xxxxx xxxxx
xxx xx xxxx xxxx xxxxxxxxxxx xxxxxxx xxx stored 9999 finished

I'm trying to catch the value between "started" and "finished"

I tried something like this

(?<block>started(.|\n)*finished)

but I don't know how to add the value \d+ near "stored"?

Does this answer your question? How to match "anything up until this sequence of characters" in a regular expression? — n00dl3
– n00dl3, Commented Nov 21, 2019 at 9:29
The regex does not work with Python re, (?<block> must be (?P<block>. Do not use (.|\n)*, use .*? with re.DOTALL. If you need to captured the digits try re.findall(r'started(.*?(?:stored\s+(\d+)\s+)?)finished', text, re.S) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 21, 2019 at 9:30
re.match("started .+?found (\d+) .+? stored (\d+) finished", flags=re.DOTALL) — n00dl3
– n00dl3, Commented Nov 21, 2019 at 9:35

Wiktor Stribiżew · Accepted Answer · 2019-11-21 09:54:10Z

2

The regex you provided does not work with Python re, as (?<block>...) is not a supported named group syntax, it must look like (?P<block>...).

Also, it is recommended to avoid (.|\n)* that is a very inefficient construct, use .*? with re.DOTALL/re.S or (?s) instead.

If you need to captured the digits alongside the digits after stored and before finished (and if this is optional) use

re.findall(r'started(.*?(?:stored\s+(\d+)\s+)?)finished', text, re.S)

See the regex demo

Details

started - left-hand delimiter
(.*?(?:stored\s+(\d+)\s+)?) - Gropup 1:
- .*? - any 0+ chars, as few as possible
- (?:stored\s+(\d+)\s+)? - an optional group matching
  - stored\s+ - stored and 1+ whitespaces
  - (\d+) - Group 2: one or more digits
  - \s+ - 1+ whitespaces
finished - right-hand delimiter.

answered Nov 21, 2019 at 9:54

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Wiktor Stribiżew Over a year ago

@IgorShilov As usual, with open(file + ".out.txt", 'w') as fw: fw.write(updated_contents)

Wiktor Stribiżew Over a year ago

@IgorShilov I have no idea what you mean. Please add the non-working code to the question and explain the expected behavior.

Igor Shilov Over a year ago

in my case regex need for current log file more 15 mb if i use this code it didin't work with ⁣ ⁣open ( 'log.log) as reading: ⁣ ⁣ ⁣ ⁣line=reading.read() ⁣ ⁣ ⁣(r'\d+.*07.*?started\s+\w+(.*?(?:stored\s+(\d+)\s+)?)finished.\w+', line, re.DOTALL) ⁣ ⁣ ⁣ ⁣if regex: ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣for res1, res2 in regex: ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣print res2

Igor Shilov Over a year ago

it works only ide if you try make it for work with file you need construct "with open (file, r): as read" then you should read file by line - "readlines()" or full - "read()" , which method i shoud use and how write corret code in this case. see: ideone.com/zK2k0g

Wiktor Stribiżew Over a year ago

@IgorShilov You seem to be using Python2. Apart from this, everything looks good, reading.read() is the right method since it reads all file contents into a single variable.

|

Collectives™ on Stack Overflow

regex in regex block

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related