4

I need to search something like this:

lines = """package p_dio_bfm is
   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      );
end p_dio_bfm; -- end package;

package body p_dio_bfm is
   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end p_dio_bfm;"""

I need to extract the package name, i.e. p_dio_bfm and the package declaration, i.e. the part between "package p_dio_bfm is" and FIRST "end p_dio_bfm;"

The problem is that the package declaration may end with "end p_dio_bfm;" or "end package;" So I tried the following "OR" regex which: - works for packages ending with "end package" - does not work for packages ending with "end pck_name;"

pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines)

The problem is the (package|\1) part of the regex, where I what to catch either the word "package" or the matched package name.

UPDATE: I have provided a full code that I hope will clarify it:

import re
lines1 = """package p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
end p_dio_bfm;

package body p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end p_dio_bfm;"""

lines2 = """package p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
end package;

package body p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end package;"""

lines1 = lines1.replace('\n', ' ')
print lines1

pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines1)

print match

lines2 = lines2.replace('\n', ' ')
print lines2

match = pattern.search(lines2)

print match

I expect in both cases, using a unique regex, to get back this part:

"""procedure setBFMCmd (
          variable  pin : in tBFMCmd
          );"""  

without the \n chars which I have removed.

1
  • Can you post expected output? Commented Jul 13, 2015 at 14:25

2 Answers 2

3

Your regex doesn't match anything since it's incorrect.Without using multi-line flag .* won't match new line character,so instead you can use [\s\S]* :

r'package ([^\s]+)\s+is([\s\S]*)end\s+(package|\1)\s*;'

See demo https://regex101.com/r/tZ3uH0/1

But there is some another problems here one that your string contains 2 package block and and this point that as a more elegant and efficient way you can sue re.DOTALL flag which make the '.' special character match any character at all, including a newline.So you can write your regex like following :

pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;",re.DOTALL)

But this still will match the first block :

>>> match = pattern.search(lines)
>>> print match.group(0)
package p_dio_bfm is
   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      );
end p_dio_bfm; -- end package;
>>> print match.group(1)
p_dio_bfm
>>> print match.group(2)

   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      );
end p_dio_bfm; -- 
>>> print match.group(3)
package

For match all blocks you need to clarify the words like body in second group :

package\s+(?:\w+\s+?)?([^\s]+)\s+is(.*?)end\s+(package|\1)\s*;

See demo https://regex101.com/r/tZ3uH0/3

Sign up to request clarification or add additional context in comments.

5 Comments

I do not see the point in using [\s\S] instead of . with re.S. You do not have to deal with specific line matching here, do you? Unless the regex needs porting to say JavaScript, I think it is more efficient to use "built-in" means to match newlines.
@Kasra, didn't the OP said : between "package p_dio_bfm is" and FIRST "end p_dio_bfm;" ?
You can take advantage to make use of the named references: (?P<needle>^\s+)\s...(package|(?P=needle)).
@stribizhev Indeed and I was editing the answer with more information, but as a start point i suggest that way!
@KhalilAmmour-خليلعمور Yeah that's the point that I'll add in update!
2

How about:

>>> for row in re.findall(
...   r'package(?:\s.*?)(?P<needle>[^\s]+)\s+is\s+(.*?)end\s+(?:package|(?P=needle));',
...   lines,
...   re.S
... ):
...   print '{{{', row[1], '}}}'
...
{{{ procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
}}}
{{{ procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
}}}

I took the liberty to not filter exactly how @mihai-hangiu asked by including the second block.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.