I have this partial XML
string = '''
<x:root>
<x:tag1 x:anyAttrib="anyValue" x:anyAttrib="anyValue" x:anyAttrib="anyValue" />
<x:tag2 x:anyAttrib="anyValue" x:anyAttrib="anyValue" x:anyAttrib="anyValue">
someValue
</x:tag2>
<x:tag3> someValue
'''
Now I would like to "stupidly" repair it. I have thought of a way- regexing all of the start elements and ending element --> checking which element is missing and just add it. with out getting into too much of details of course. what I've come with so far is (and this does not work):
import re
starts = re.compile('(?<=<)x:\w+(?=>)|(?<=<)x:\w+(?! .+ />)')
print(start.findall(string))
what I expect is a list of x:root , x:tag2 , x:tag3
I've been googling and trying alot but could not find an answer. They only thing I get from this expression is x:root , x:tag1 , x:tag3.
Please help
Thanks
http://tidy.sourceforge.net/