split on regex python between two strings but inclusive using re.split and return a list

Question

I am trying to split a piece of text in a file formatted like this:

module 
some text
endmodule

module 
some other text
endmodule

between the words module and endmodule and still include module and endmodule in the output string.

This is not a duplicate of other regex questions because I am trying to use re.split() to return a list, not find.

This is the regex I've tried

s=file.read()
l=re.split("module(.*)endmodule",s)

but it won't split anything...

Ideally final output would be a list that includes both modules as strings,

['module\n sometext\n endmodule', 'module\n someothertext\n endmodule']

Emma Marcier · Accepted Answer · 2019-07-09 23:45:27Z

1

My guess is that you might want to design an expression similar to:

module(.*?)endmodule

not sure though.

Test with re.finditer

import re

regex = r"module(.*?)endmodule"

test_str = ("module \n"
    "some text\n"
    "endmodule\n\n"
    "module \n"
    "some other text\n"
    "endmodule")

matches = re.finditer(regex, test_str, re.DOTALL)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Test with `re.findall`

import re

regex = r"module(.*?)endmodule"

test_str = ("module \n"
    "some text\n"
    "endmodule\n\n"
    "module \n"
    "some other text\n"
    "endmodule")

print(re.findall(regex, test_str, re.DOTALL))

The expression is explained on the top right panel of this demo, if you wish to explore further or simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.

answered Jul 9, 2019 at 23:45

Emma Marcier

27.8k12 gold badges49 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Hisham Hijjawi Over a year ago

this is matching the entire file, but I have multiple occurences of this pattern in the file.

Hisham Hijjawi Over a year ago

Sorry, your answer was perfect, I messed up. I need to include re.DOTALL

Julius Vainora · Accepted Answer · 2019-07-10 00:03:00Z

1

We could use a positive lookbehind and a positive lookahead as in

print(re.split('(?<=endmodule)[.\n]*?(?=module)', s))

giving

['module\nsome text\nendmodule', 'module\nsome other text\nendmodule']

where

s = ("module\n"
     "some text\n"
     "endmodule\n\n"
     "module\n"
     "some other text\n"
     "endmodule")

answered Jul 10, 2019 at 0:03

Julius Vainora

48.4k9 gold badges95 silver badges108 bronze badges

Collectives™ on Stack Overflow

split on regex python between two strings but inclusive using re.split and return a list

2 Answers 2

Test with re.finditer

Test with `re.findall`

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Test with re.finditer

Test with re.findall

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

Test with `re.findall`