0

I'm reading a XML (with XmlDocument) with some html inside of it. But sometimes i got a bad formatted XML something lie this:

   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
    <head> 
    <meta http-equiv="refresh" content="0; url=<mocktag/>?<mock_tag=<moc
    ktag/>&mocktag_2=<mockta
    g/>
    </head> 
    </html> 

As you can see i have bad formatted tags and for that reason it causes my program to crash. So my question is any way to read the xml string correctly? Maybe transform this string into a one-line string?

7
  • 1
    You could run a regex that looks for a newline + spaces and remove them. Hopefully that wouldn't adversely affect other legitimate content. Commented Oct 12, 2017 at 14:21
  • 2
    "sometimes i got a bad formatted XML" - you should rather improve the method of obtaining xml than attempting to fix bad result. Commented Oct 12, 2017 at 14:22
  • 1
    This XML doesn't have tags broken by newlines. It has hopeless garble from content="0; to just before </head>. OP, how many different forms does the mocktag garble take? Is it regular enough to reliably search for? Commented Oct 12, 2017 at 14:29
  • Agree with @EdPlunkett. If you "fix" those tags, you still have invalid xml. You have meta tag like this. In your xml there is "> missing before </head>. You should contact one who generates this xml. Commented Oct 12, 2017 at 14:38
  • 1
    The html is bad, not the xml. The html wasn't encoded properly. The are invalid html characters like the ampersand. See wiki : en.wikipedia.org/wiki/… Commented Oct 12, 2017 at 14:45

1 Answer 1

2

To format it into one line you can use Regex:

output = Regex.Replace(output, @"\s+", " ", RegexOptions.Multiline);

Should remove every spaces and put every line into one.

Sign up to request clarification or add additional context in comments.

2 Comments

Have you tested this solution on the example XML that OP provided in the question?
Thanks @Tom. I tested your solution and i guess everything works fine. I'll need just to double check again if i break something else. but so far working :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.