0

I want to perform the selection of a group of lines in a text file to get all jobs related to an ipref The test file is like this : job numbers : (1,2,3), ip ref : (10,12,10)

text file : 1 ... (several lines of text) xxx 10 2 ... (several lines of text) xxx 12 3 ... (several lines of text) xxx 10

i want to select job numbers for IPref=10.

Code :

#!/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()


#pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'
pattern='\n?\d.*?xxx 10'

result= re.findall(pattern,texte, re.DOTALL)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)

Result :

match: 1
1
a
b
xxx 10

match: 2

1
a
b
xxx 12
1
a
b
xxx 10

i have tried to replace .* by a a negative lookahead assertion to only select if no expr like "\n?xxx \d{2}\n" is before "xxx 10" :

pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'

but it is not working ...

1
  • Namely, what is the question? Commented Oct 29, 2022 at 12:27

4 Answers 4

1

You can write the pattern in this way, repeating the newline and asserting not xxx followed by 1 or more digits:

^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$

The pattern matches:

  • ^ Start of string
  • \d Match a single digit (or \d+ for 1 or more)
  • (?: Non capture group
    • \n Match a newline
    • (?!xxx \d+$) Negative lookahead to assert that the string is not xxx followed by 1+ digits
    • .* If the assertion is true, match the whole line
  • )* Close the group and optionally repeat it
  • \nxxx 10$ Match a newline, xxx and 10

Regex demo

Sign up to request clarification or add additional context in comments.

Comments

1

Good day to you :) and Thank you very much for your quick response!! i give you below the result Note : i have modified re.DOTALL by re.DOTALL|re.MULTILINE (because the result is none without that... Sorry for the previous presentation ... it wat not very clear)

Text file :

1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10

Code With your pattern :

#!/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()
print(texte)

#pattern='<\/?(?!(?:span|br|b)(?: [^>]*)?>)[^>\/]*>'
#pattern='\n?\d(?!(?:\n?xxx \d{2}\n?)*?)xxx 10'
#pattern='\n?\d.*?xxx 10'
pattern='^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$'

result= re.findall(pattern,texte, re.DOTALL|re.MULTILINE)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)

Result :

match: 1
1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10 

but i try to obtain :

match: 1
1
a
b
xxx 10

match 2 : 
1
a
b
xxx 10

4 Comments

You can omit re.DOTALL
The pattern works for the given data, see regex101.com/r/UpDuHP/1
ouahhh... amazing!, i know where to test my regex patterns now ! thank you for this link
i have embedded the python in a bash script like that :
0

Thank you very much, (you saved my day !!) as you say :

pattern='^\d(?:\n(?!xxx \d+$).*)*\nxxx 10$'
result= re.findall(pattern,texte, re.MULTILINE)

result : OK, the line group (1..xxx 12) is ignored, NOTE : i can adapt it to a case where line 1 is a line giving job information and "xxx 12" is a line giving printer IP information.

match: 1
1
a
b
xxx 10

match: 2
1
a
b
xxx 10

2 Comments

If the posted answer worked out for you, on SO you can accept the answer by checking the grey check mark to the left of it instead of posting new answers.
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
0

file :

job_number job_id
1 10202
bla bla
bla bla bla
xxx 100.10.10.100
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

bash script with embedded python script :

#!/bin/bash

# function , $1 : ip of a printer
get_jobs_ip ()
{
cat <<EOF | python
import re

fic=open('test3.xml','r')
texte=fic.read()
fic.close()

"""
The pattern matches example with ip="100\.10\.10\.100" :
thank you to Fourth bird for the pattern !!!
#pattern='^\d\s+\d+(?:\n(?!xxx \d+\.\d+\.\d+\.\d+$).*)*\nxxx 100\.10\.10\.100$'

^ Start of string
\d Match a single digit (or \d+ for 1 or more)
(?: Non capture group
\n Match a newline
(?!xxx \d+\.\d+\.\d+\.\d+$) Negative lookahead to assert that the string is not xxx  followed by 1+ digits
.* If the assertion is true, match the whole line
)* Close the group and optionally repeat it
\nxxx 100\.10\.10\.100$ Match a newline, xxx  and 10
"""

ip="$1"
pattern_template='^\d\s+\d+(?:\n(?!xxx \d+\.\d+\.\d+\.\d+$).*)*\nxxx @ip@$'
pattern=pattern_template.replace('@ip@',ip)

result= re.findall(pattern,texte, re.MULTILINE)

i=1
for match in result:
    print("\nmatch:",i)
    i=i+1
    print(match)
EOF
}

get_jobs_ip "100\.10\.10\.100"
get_jobs_ip "100\.10\.10\.102"

result :

match: 1
1 10202
bla bla
bla bla bla
xxx 100.10.10.100

match: 2
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

match: 1
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.