5

I've got a few huge xml files, 1+ gb. I need to do some filtering operations with them. The easiest idea I've come up with is to save them as txt and ReadAllText from them, and start doing some operations like

  var a = File.ReadAllText("file path");
  a = a.Replace("<", "\r\n<");

The moment I try to do that, however, the program crashes out of memory. I've looked at my task manager while I run it and the RAM usage climbs to 50% and the moment it reaches it the program dies.

Does anyone have any ideas on how I operate with this file avoiding the OutOfMemory exception or allow the program to pull on more of the memory.

10
  • 4
    Use streams, not strings. Commented Nov 1, 2017 at 7:34
  • Is the replacement the “filtering”, or is it something else? Take a look at XmlReader, anyway. (I think that’s the right one.) Commented Nov 1, 2017 at 7:35
  • In general, try to avoid treating XML as "just strings". Use tools designed for working with XML as much as possible, unless what you're trying to produce isn't XML but is "something that looks like XML but I'm doing odd things to it such that it isn't technically XML any more" Commented Nov 1, 2017 at 7:37
  • 2
    If you compare xml elements between two files - even less reason to treat xml as text, because two xml elements might have different text representation (like self-closing tag vs open-close tag) while having identical content. Commented Nov 1, 2017 at 7:53
  • 1
    And to add to Evks examples, semantically <a:thing xmlns:a="urn:123"/> and <b:thing xmlns:b="urn:123"/> are the same also. Commented Nov 1, 2017 at 8:01

1 Answer 1

6

If you can do it line by line, instead of saying "Read everything to memory" with File.ReadAllText, you can say "Yield me one line at time" with File.ReadLines.

This will return IEnumerable which uses deferred execution. You can do it like this:

using(StreamWriter sw = new StreamWriter(newFilePath))
foreach(var line in File.ReadLines(path))
{
    sw.WriteLine(line.Replace("<", "\r\n<"));
}

If you want to learn more about deferred execution, you can check this github page.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.