I have some invalid XML's ( < > & "" characters inside the attribute value). I need to parse them to a correct XML file in C#.
The only way I can think of is escaping the invalid characters inside the attributes. This works fine for < > and & (< ;, > ;, & ;). However I have problems detecting and changing the "" inside the attributes.
Right now I am using this regex for matching attribute values:
/="(.*?)"
My test case is this:
<add sqlQuery="select blaat from test where count == "1"" test="dfsdf"/>
<add sqlQuery="select blaat from test where count == "1"" test="dfsdf" />
<add sqlQuery="select blaat from test where count == "1" and blaat > 3" test="dfsdf"/>
<add xmlDiff_action="MoveNodeFrom('1')" alias="jkhkjh" />
<add xmlDiff_action="MoveNodeFrom('1')" />
RegEx test link with not greedy
As you can see in the test the matching stops at the quote "1""
If I change the regex to greedy /="(.*)" I match the whole line (so including the other attributes on the same line.
It is hard to define the "end quote" of an xml attribute. In my test cases it can end in:
- " (space)
- "/>
- "
- " otherAttribute="value"
I know that the it looks unnecessary that I want to parse this invalid xml (even invalid sql query because it uses double spaces and quotes for == "1". Thas is because it comes from another application which saves all the data in a CDATA section. But for what I am doing I need to parse that CDATA section into correct XML (with escaping the invalid characters)
Huge thanks in advance if somebody could solve this in RegEx or combination of RegEx and C#!
"1"are possible"? If it's not="1"but== "1"then it's invalid and shold be fixed... mhmmm