Question about a multi line regex in python language

Question

I want to perform the selection of a group of lines in a text file to get all jobs related to an ipref The test file is like this : job numbers : (1,2,3), ip ref : (10,12,10)

text file : 1 ... (several lines of text) xxx 10 2 ... (several lines of text) xxx 12 3 ... (several lines of text) xxx 10

i want to select job numbers for IPref=10.

Code :

#!/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()


#pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'
pattern='\n?\d.*?xxx 10'

result= re.findall(pattern,texte, re.DOTALL)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)

Result :

match: 1
1
a
b
xxx 10

match: 2

1
a
b
xxx 12
1
a
b
xxx 10

i have tried to replace .* by a a negative lookahead assertion to only select if no expr like "\n?xxx \d{2}\n" is before "xxx 10" :

pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'

but it is not working ...

Namely, what is the question?

mkrieger1
– mkrieger1

2022-10-29 12:27:13 +00:00
Commented Oct 29, 2022 at 12:27 — mkrieger1
– mkrieger1, Commented Oct 29, 2022 at 12:27

The fourth bird · Accepted Answer · 2022-10-29 12:36:08Z

1

You can write the pattern in this way, repeating the newline and asserting not xxx followed by 1 or more digits:

^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$

The pattern matches:

^ Start of string
\d Match a single digit (or \d+ for 1 or more)
(?: Non capture group
- \n Match a newline
- (?!xxx \d+$) Negative lookahead to assert that the string is not xxx followed by 1+ digits
- .* If the assertion is true, match the whole line
)* Close the group and optionally repeat it
\nxxx 10$ Match a newline, xxx and 10

Regex demo

edited Oct 29, 2022 at 12:36

answered Oct 29, 2022 at 12:28

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Frederic Faure · Accepted Answer · 2022-10-29 12:56:26Z

1

Good day to you :) and Thank you very much for your quick response!! i give you below the result Note : i have modified re.DOTALL by re.DOTALL|re.MULTILINE (because the result is none without that... Sorry for the previous presentation ... it wat not very clear)

Text file :

1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10

Code With your pattern :

#!/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()
print(texte)

#pattern='<\/?(?!(?:span|br|b)(?: [^>]*)?>)[^>\/]*>'
#pattern='\n?\d(?!(?:\n?xxx \d{2}\n?)*?)xxx 10'
#pattern='\n?\d.*?xxx 10'
pattern='^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$'

result= re.findall(pattern,texte, re.DOTALL|re.MULTILINE)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)

Result :

match: 1
1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10

but i try to obtain :

match: 1
1
a
b
xxx 10

match 2 : 
1
a
b
xxx 10

answered Oct 29, 2022 at 12:56

Frederic Faure

133 bronze badges

4 Comments

The fourth bird Over a year ago

You can omit re.DOTALL

The fourth bird Over a year ago

The pattern works for the given data, see regex101.com/r/UpDuHP/1

Frederic Faure Over a year ago

ouahhh... amazing!, i know where to test my regex patterns now ! thank you for this link

Frederic Faure Over a year ago

i have embedded the python in a bash script like that :

Frederic Faure · Accepted Answer · 2022-10-29 13:14:55Z

0

Thank you very much, (you saved my day !!) as you say :

pattern='^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$'
result= re.findall(pattern,texte, re.MULTILINE)

result : OK, the line group (1..xxx 12) is ignored, NOTE : i can adapt it to a case where line 1 is a line giving job information and "xxx 12" is a line giving printer IP information.

match: 1
1
a
b
xxx 10

match: 2
1
a
b
xxx 10

answered Oct 29, 2022 at 13:14

Frederic Faure

133 bronze badges

2 Comments

The fourth bird Over a year ago

If the posted answer worked out for you, on SO you can accept the answer by checking the grey check mark to the left of it instead of posting new answers.

Community Over a year ago

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Frederic Faure · Accepted Answer · 2022-10-29 14:36:38Z

file :

job_number job_id
1 10202
bla bla
bla bla bla
xxx 100.10.10.100
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

bash script with embedded python script :

#!/bin/bash

# function , $1 : ip of a printer
get_jobs_ip ()
{
cat <<EOF | python
import re

fic=open('test3.xml','r')
texte=fic.read()
fic.close()

"""
The pattern matches example with ip="100\.10\.10\.100" :
thank you to Fourth bird for the pattern !!!
#pattern='^\d\s+\d+(?:\n(?!xxx \d+\.\d+\.\d+\.\d+$).*)*\nxxx 100\.10\.10\.100$'

^ Start of string
\d Match a single digit (or \d+ for 1 or more)
(?: Non capture group
\n Match a newline
(?!xxx \d+\.\d+\.\d+\.\d+$) Negative lookahead to assert that the string is not xxx  followed by 1+ digits
.* If the assertion is true, match the whole line
)* Close the group and optionally repeat it
\nxxx 100\.10\.10\.100$ Match a newline, xxx  and 10
"""

ip="$1"
pattern_template='^\d\s+\d+(?:\n(?!xxx \d+\.\d+\.\d+\.\d+$).*)*\nxxx @ip@$'
pattern=pattern_template.replace('@ip@',ip)

result= re.findall(pattern,texte, re.MULTILINE)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)
EOF
}

get_jobs_ip "100\.10\.10\.100"
get_jobs_ip "100\.10\.10\.102"

result :

match: 1
1 10202
bla bla
bla bla bla
xxx 100.10.10.100

match: 2
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

match: 1
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102

Collectives™ on Stack Overflow

Question about a multi line regex in python language

4 Answers 4

Comments

4 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

4 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related