0

I have a text file (.txt) which has text data, binary data, and XML data all mixed together within it. I've googled around for a few minutes and cannot figure out how to only extract the XML from this text file. Can the good users of SO offer some suggestion?

I'm using C# 4.0.

Since I cannot simply load the text file into an XDocument, I've been messing with regex, but this approach is getting me no where.

3
  • 1
    Sounds like a horrible file. Is there any kind of delimiter in the file between the sections? Commented Aug 19, 2011 at 17:32
  • it's actually a MIME saved as text. i'm having some luck, actually, with regex and using single-line mode. apparently there are only line feeds between XML elements. Commented Aug 19, 2011 at 17:39
  • Got it. I used this regex: <TAG\b[^>]*>(.*?)</TAG> From here- regular-expressions.info/examples.html And ensured that I was using single-line mode because of the line feeds between the XML elements. Commented Aug 19, 2011 at 17:43

1 Answer 1

1

First of all, file can't be text and binary simultaneously: if it contains binary data, it's binary file. But from your description seems like it's a text file with some binary data in text-encoded form.

If you know what root tag name is then use substring search to locate start and end of xml document, "cut" it, and then you can process it in any way you want.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.