python3: extract IP address from compiled pattern

Question

I want to process every line in my log file, and extract IP address if line matches my pattern. There are several different types of messages, in example below I am using p1andp2`.

I could read the file line by line, and for each line match to each pattern. But Since there can be many more patterns, I would like to do it as efficiently as possible. I was hoping to compile thos patterns into one object, and do the match only once for each line:

import re

IP = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'

p1 = 'Registration from' + IP + '- Wrong password' 
p2 = 'Call from' + IP + 'rejected because extension not found'

c = re.compile(r'(?:' + p1 + '|' + p2 + ')')

for line in sys.stdin:
    match = re.search(c, line)
    if match:
        print(match['ip'])

but the above code does not work, it complains that ip is used twice.

What is the most elegant way to achieve my goal ?

EDIT:

I have modified my code based on answer from @Dev Khadka.

But I am still struggling with how to properly handle the multiple ip matches. The code below prints all IPs that matched p1:

for line in sys.stdin:
    match = c.search(line)
    if match:
        print(match['ip1'])

But some lines don't match p1. They match p2. ie, I get:

1.2.3.4
None
2.3.4.5
...

How do I print the matching ip, when I don't know wheter it was p1, p2, ... ? All I want is the IP. I don't care which pattern it matched.

You should provide your test data.

eyllanesc
– eyllanesc

2019-10-21 03:21:43 +00:00
Commented Oct 21, 2019 at 3:21 — eyllanesc
– eyllanesc, Commented Oct 21, 2019 at 3:21

blhsing · Accepted Answer · 2019-10-21 05:16:58Z

2

+100

You can consider installing the excellent regex module, which supports many advanced regex features, including branch reset groups, designed to solve exactly the problem you outlined in this question. Branch reset groups are denoted by (?|...). All capture groups of the same positions or names in different alternative patterns within a branch reset grouop share the same capture groups for output.

Notice that in the example below the matching capture group becomes the named capture group, so that you don't need to iterate over multiple groups searching for a non-empty group:

import regex

ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
    'Registration from {ip} - Wrong password',
    'Call from {ip} rejected because extension not found'
]
pattern = regex.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
    match = regex.search(pattern, line)
    if match:
        print(match['ip'])

Demo: https://repl.it/@blhsing/RegularEmbellishedBugs

edited Oct 21, 2019 at 5:16

answered Oct 21, 2019 at 4:51

blhsing

109k9 gold badges88 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Subham Over a year ago

999.999.999.999 [Program finished] which is actually not a valid ip... should we use import ipaddress

lenik · Accepted Answer · 2019-10-21 03:59:37Z

2

why don't you check which regex matched?

if 'ip1' in match :
    print match['ip1']
if 'ip2' in match :
    print match['ip2']

or something like:

names = [ 'ip1', 'ip2', 'ip3' ]
for n in names :
    if n in match :
        print match[n]

or even

num = 1000   # can easily handle millions of patterns =)
for i in range(num) :
    name = 'ip%d' % i
    if name in match :
        print match[name]

edited Oct 21, 2019 at 3:59

answered Oct 21, 2019 at 3:33

lenik

23.6k4 gold badges38 silver badges44 bronze badges

5 Comments

Martin Vegter Over a year ago

but what if I have 100 patterns? Can I do this in a loop ? Can I itterate over the match[i] in a for loop ?

lenik Over a year ago

@MartinVegter see above

lenik Over a year ago

@MartinVegter can handle millions of patterns easily =)

Martin Vegter Over a year ago

I get an error: if match[name] is not None: IndexError: no such group

lenik Over a year ago

@MartinVegter try to use name in match instead

Dev Khadka · Accepted Answer · 2019-10-16 15:34:07Z

1

thats because you are using same group name for two group

try this, this will give group names ip1 and ip2

import re

IP = r'(?P<ip%d>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'

p1 = 'Registration from' + IP%1 + '- Wrong password' 
p2 = 'Call from' + IP%2 + 'rejected because extension not found'

c = re.compile(r'(?:' + p1 + '|' + p2 + ')')

answered Oct 16, 2019 at 15:34

Dev Khadka

5,5415 gold badges23 silver badges36 bronze badges

Comments

blhsing · Accepted Answer · 2019-10-21 04:30:32Z

1

Named capture groups must have distinct names, but since all of your capture groups are meant to capture the same pattern, it's better not to use named capture groups in this case but instead simply use regular capture groups and iterate through the groups from the match object to print the first group that is not empty:

ip_pattern = r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
    'Registration from {ip} - Wrong password',
    'Call from {ip} rejected because extension not found'
]
pattern = re.compile('|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
    match = re.search(pattern, line)
    if match:
        print(next(filter(None, match.groups())))

Demo: https://repl.it/@blhsing/UnevenCheerfulLight

edited Oct 21, 2019 at 4:30

answered Oct 21, 2019 at 4:05

blhsing

109k9 gold badges88 silver badges132 bronze badges

Comments

Subham · Accepted Answer · 2021-03-21 07:42:01Z

Adding ip address validity to already accepted answer. Altho import ipaddress & import socket should be ideal ways, this code will parse-the-host,

import regex as re 
from io import StringIO



def valid_ip(address):
    try:
        host_bytes = address.split('.')
        valid = [int(b) for b in host_bytes]
        valid = [b for b in valid if b >= 0 and b<=255]
        return len(host_bytes) == 4 and len(valid) == 4
    except:
        return False
    
        
    
        

ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'

patterns = patterns = [
    'Registration from {ip} - Wrong password',
    'Call from {ip} rejected because extension not found'
] 

file = StringIO('''
Registration from 259.1.1.1 - Wrong password,
    Call from 1.1.2.2 rejected because extension not found
''')

pattern = re.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))

list1 = []
list2 = []

for line in file:
    match = re.search(pattern, line)
    if match:
        list1.append(match['ip']) # List of ip address 
        list2.append(valid_ip(match['ip'])) # Boolean results of valid_ip 


for i in range(len(list1)):
        if list2[i] == False:
            print(f'{list1[i]} is invalid IP')
        else:
            print(list1[i])

259.1.1.1 is invalid IP
1.1.2.2

[Program finished]

Collectives™ on Stack Overflow

python3: extract IP address from compiled pattern

5 Answers 5

1 Comment

5 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

5 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related