0

I am trying to split a piece of text in a file formatted like this:

module 
some text
endmodule

module 
some other text
endmodule

between the words module and endmodule and still include module and endmodule in the output string.

This is not a duplicate of other regex questions because I am trying to use re.split() to return a list, not find.

This is the regex I've tried

s=file.read()
l=re.split("module(.*)endmodule",s)

but it won't split anything...

Ideally final output would be a list that includes both modules as strings,

['module\n sometext\n endmodule', 'module\n someothertext\n endmodule']

2 Answers 2

1

My guess is that you might want to design an expression similar to:

module(.*?)endmodule

not sure though.

Test with re.finditer

import re

regex = r"module(.*?)endmodule"

test_str = ("module \n"
    "some text\n"
    "endmodule\n\n"
    "module \n"
    "some other text\n"
    "endmodule")

matches = re.finditer(regex, test_str, re.DOTALL)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Test with re.findall

import re

regex = r"module(.*?)endmodule"

test_str = ("module \n"
    "some text\n"
    "endmodule\n\n"
    "module \n"
    "some other text\n"
    "endmodule")

print(re.findall(regex, test_str, re.DOTALL))

The expression is explained on the top right panel of this demo, if you wish to explore further or simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.

Sign up to request clarification or add additional context in comments.

2 Comments

this is matching the entire file, but I have multiple occurences of this pattern in the file.
Sorry, your answer was perfect, I messed up. I need to include re.DOTALL
1

We could use a positive lookbehind and a positive lookahead as in

print(re.split('(?<=endmodule)[.\n]*?(?=module)', s))

giving

['module\nsome text\nendmodule', 'module\nsome other text\nendmodule']

where

s = ("module\n"
     "some text\n"
     "endmodule\n\n"
     "module\n"
     "some other text\n"
     "endmodule")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.