I have this file which contains several math tags like so:
<Math
<Unique 262963>
<BRect 1.02176" 0.09096" 1.86024" 0.40658">
<MathFullForm `equal[therefore[char[tau]],plus[indexes[0,1,char[tau],char[c]],minus[times[indexes[
0,1,char[tau],char[s]],string[" and "],over[times[char[d],char[omega]],times[char[
d],char[t]]]]]],over[char[tau],char[I]]]'
> # end of MathFullForm
<MathLineBreak 138.88883">
<MathOrigin 1.95188" 0.32125">
<MathAlignment Center>
<MathSize MathMedium>
> # end of Math
And like so:
<Math
<Unique 87795>
<Separation 0>
<ObColor `Black'>
<RunaroundGap 0.0 pt>
<BRect 0.01389" 0.01389" 0.17519" 0.22013">
<MathFullForm `indexes[0,1,char[m,0,0,1,0,0],char[i]]'
> # end of MathFullForm
And I want to extract the contents of the Unique tag and the MathFullForm tag, but I am at a loss at how to do so. Note that Unique tags exist elsewhere in the file, outside of Math tags.
I've tried using regex but that doesn't work too well and misses many of the tags. I then thought about using an XML parser, but that wouldn't work because the code isn't valid XML.
Can anyone steer me in the right direction to do this in Python (a regex solution is acceptable).