I'm using REGEX to compile a list of strings from an HTML document in Python.
The strings are either found inside a td tag (<td>SOME OF THE STRINGS COULD BE HERE</td>) or inside a div tag (<div style="line-height: 100%;margin:0;padding:0;">SOME STRINGS COULD ALSO BE HERE</div>).
Since the order of the strings inside the final list should correspond to the order in which they appear inside the HTML document, I am looking for a REGEX that will allow me to compile all of these strings considering both possible cases.
I know how to do it individually with something that looks like:
FindStrings = re.compile('(?<=\<td>)(.*?)(?=\</td>)')
MyList = re.findall(FindStrings, str(mydocument))
for the first case, but would like to know the most efficient way to combine both cases inside a unique REGEX.
