7

I'm having an issue compiling the correct regular expression for a multiline match. Can someone point out what I'm doing wrong. I'm looping through a basic dhcpd.conf file with hundreds of entries such as:

host node20007                                                                                                                  
{                                                                                                                              
    hardware ethernet 00:22:38:8f:1f:43;                                                                                       
    fixed-address node20007.domain.com;     
}

I've gotten various regex's to work for the MAC and fixed-address but cannot combine them to match properly.

f = open('/etc/dhcp3/dhcpd.conf', 'r')
re_hostinfo = re.compile(r'(hardware ethernet (.*))\;(?:\n|\r|\r\n?)(.*)',re.MULTILINE)

for host in f:
match = re_hostinfo.search(host)
    if match:
        print match.groups()

Currently my match groups will look like:
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', '')

But looking for something like:
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')

1
  • If the file is exactly this format it might be easier yo just split lines on spaces and take the element at the end as the value Commented Jan 19, 2011 at 21:10

2 Answers 2

13

Update I've just noticed the real reason that you are getting the results that you got; in your code:

for host in f:
    match = re_hostinfo.search(host)
    if match:
        print match.groups()

host refers to a single line, but your pattern needs to work over two lines.

Try this:

data = f.read()
for x in regex.finditer(data):
    process(x.groups())

where regex is a compiled pattern that matches over two lines.

If your file is large, and you are sure that the pieces of interest are always spread over two lines, then you could read the file a line at a time, check the line for the first part of the pattern, setting a flag to tell you whether the next line should be checked for the second part. If you are not sure, it's getting complicated, maybe enough to start looking at the pyparsing module.

Now back to the original answer, discussing the pattern that you should use:

You don't need MULTILINE; just match whitespace. Build up your pattern using these building blocks:

(1) fixed text (2) one or more whitespace characters (3) one or more non-whitespace characters

and then put in parentheses to get your groups.

Try this:

>>> m = re.search(r'(hardware ethernet\s+(\S+));\s+\S+\s+(\S+);', data)
>>> print m.groups()
('hardware ethernet   00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>

Please consider using "verbose mode" ... you can use it to document exactly which pieces of pattern match which pieces of data, and it can often help getting the pattern right in the first place. Example:

>>> regex = re.compile(r"""
... (hardware[ ]ethernet \s+
...     (\S+) # MAC
... ) ;
... \s+ # includes newline
... \S+ # variable(??) text e.g. "fixed-address"
... \s+
... (\S+) # e.g. "node20007.domain.com"
... ;
... """, re.VERBOSE)
>>> print regex.search(data).groups()
('hardware ethernet   00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>
Sign up to request clarification or add additional context in comments.

Comments

0

Sometimes, the easier method is not using regex. Just an example

for line in open("dhcpd.conf"):
    line = line.rstrip()
    sline = line.split()
    if "hardware ethernet" or "fixed-address" in line:
       print sline[-1]

another way

data = open("file").read().split("}");
for item in data:
    item = [ i.strip() for i in item.split("\n") if i != '' ];
    for elem in item:
       if "hardware ethernet" in elem:
           print elem.split()[-1]
    if item: print  item[-1]

output

$ more file
host node20007
{
    hardware ethernet 00:22:38:8f:1f:43;
        fixed-address node20007.domain.com;
}

host node20008
{
    hardware ethernet 00:22:38:8f:1f:44;
        some-address node20008.domain.com;
}

$ python test.py
00:22:38:8f:1f:43;
fixed-address node20007.domain.com;
00:22:38:8f:1f:44;
some-address node20008.domain.com;

1 Comment

but note that the OP seems not to care whether the line after "hardware ethernet" contains "fixed-address" or not ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.