I have this string (html):
html = 'x<sub>i</sub> - y<sub>i)<sub>2</sub>'
I would like to convert this html string to latex in a robust way. Let me explain:
<sub>SOMETHING</sub>-> converted to_{SOMETHING}
I already know how to do that:
latex = re.sub(r'<sub>(.*?)</sub>',r'_{\1} ', html)
- Sometimes the first part
<sub>or its closing tag is missing, like in the example string. In that case, the output should still be correct.
So how I was thinking of doing it is: After running 1, I take the string after <sub> and anything before </sub> with _{SOMETHING}
text = re.sub(r'<sub>(.*?)</sub>',r'_{\1} ', html)
print(text)
# if missing part:
text = re.sub(r'<sub>(.*?)',r'_{\1} ', text)
print(text)
latex = re.sub(r'(.*?)</sub>',r'_{\1} ', text)
… but I get:
x_{i} - y_{i)<sub>2}
x_{i} - y_{i)_{} 2}
x_{i} - y_{i)_{} 2}
What I would like to get:
x_{i} - y_{i})_{2}
text = text.replace('<sub>', '_{').replace('</sub>', '}')should do.x_{i} - y_{i)_{2}. It's almost good, but there is a missing}bracket after the secondi.}is missing? It is not possible without more detailed requirements.</sub>may reside in the next segment, so it should suffice to just replace them one by one separately (this is a very common scenario in localization). That means you do not need to make any guess work. If it is not your case, you should explain the tagged text format or context the text appears in, else, the "regular" language is of no help.