I have a regular expression that should work to remove all content in a file before div id="content" and including/after <div id="footer"
([\s\S]*)(?=<div id="content")|(?=<div id="footer)([\s\S]*)
I am using the re module to work with the regex in python. The code I am using in my python:
file = open(file_dir)
content = file.read()
result = re.search('([\s\S]*)(?=<div id="content")|(?=<div id="footer)([\s\S]*))', content)
I have tried using re.match as well. I am unable to return the content I want. Right now I can only get it to return everything BEFORE the div#content
<div>tags or do you want those to be removed?<div id="content"and everything inside that tag. I want to NOT include the<div id="footer"and everything after it. So basically just want the HTML/content for everything inside the<div id="content"