2

In a Windows Forms C# app, I have a textbox where users paste log data, and it sorts it. I need to check each line individualy so I split the input by the new line, but if there are a lot of lines, greater than 100,000 or so, it throws a OutOfMemoryException.

My code looks like this:

StringSplitOptions splitOptions = new StringSplitOptions();
if(removeEmptyLines_CB.Checked)
    splitOptions = StringSplitOptions.RemoveEmptyEntries;
else
    splitOptions = StringSplitOptions.None;

List<string> outputLines = new List<string>();

foreach(string line in input_TB.Text.Split(new string[] { "\r\n", "\n" }, splitOptions))
{
    if(line.Contains(inputCompare_TB.Text))
        outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);

The problem comes from when I split the textbox text by line, here input_TB.Text.Split(new string[] { "\r\n", "\n" }

Is there a better way to do this? I've thought about taking the first X amount of text, truncating at a new line and repeat until everything has been read, but this seems tedious. Or is there a way to allocate more memory for it?

Thanks, Garrett

Update

Thanks to Attila, I came up with this and it seems to work. Thanks

StringReader reader = new StringReader(input_TB.Text);
string line;
while((line = reader.ReadLine()) != null)
{
    if(line.Contains(inputCompare_TB.Text))
        outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);

5 Answers 5

3

The better way to do this would be to extract and process one line at a time, and use a StringBuilder to create the result:

StringBuilder outputTxt = new StringBuilder();
string txt = input_TB.Text;
int txtIndex = 0;
while (txtIndex < txt.Length) {
  int startLineIndex = txtIndex;
GetMore:
  while (txtIndex < txt.Length && txt[txtIndex] != '\r'  && txt[txtIndex] != '\n')) {
    txtIndex++;
  }
  if (txtIndex < txt.Length && txt[txtIndex] == '\r' && (txtIndex == txt.Length-1 || txt[txtIndex+1] != '\n') {
    txtIndex++;
    goto GetMore; 
  }
  string line = txt.Substring(startLineIndex, txtIndex-startLineIndex);
  if (line.Contains(inputCompare_TB.Text)) {
    if (outputTxt.Length > 0)
      outputTxt.Append(Environment.NewLine);
    outputTxt.Append(line); 
  }
  txtIndex++;
} 
output_TB.Text = outputTxt.ToString(); 

Pre-emptive comment: someone will object to the goto - but it is what's needed here, the alternatives are much more complex (reg exp for example), or fake the goto with another loop and continue or break

Using a StringReader to split the lines is a much cleaner solution, but it does not handle both \r\n and \n as a new line:

StringReader reader = new StringReader(input_TB.Text); 
StringBuilder outputTxt = new StringBuilder();
string compareTxt = inputCompare_TB.Text;
string line; 
while((line = reader.ReadLine()) != null) { 
  if (line.Contains(compareTxt)) {
    if (outputTxt.Length > 0)
      outputTxt.Append(Environment.NewLine);
    outputTxt.Append(line); 
  }
} 
output_TB.Text = outputTxt.ToString(); 
Sign up to request clarification or add additional context in comments.

6 Comments

I didn't even know you could use a goto statement in c#, don't think I've used one since I was a kid playing around with pascal and basic, interesting. This seems overly complicated though, take a look at my update to my question.
I added a note at the end of my answer - your update is cleaner but does not handle both \r\n and \n as end lines. If you can do away with that it is fine - I still suggest to use a StringBuilder, avoiding to create a (big?) intermediate list of strings.
Yes, goto are possible in C#, and I use them - sparingly - as in this case.
StringReader handles \r, \n or \r\n as a newline character.
In my update code, does 'while((line = reader.ReadLine()) != null)' create a bunch of strings or does it just keep re-using the same one?
|
3

Split will have to duplicate the memory need of the original text, plus overhead of string objects for each line. If this causes memory issues, a reliable way of processing the input is to parse one line at a time.

1 Comment

Thanks, take a look at my update and let me know if that is what you meant. I will mark this as answered soon, I just want to see a couple other ideas. Thanks again!
0

I guess the only way to do this on large text files is to open the file manually and use a StreamReader. Here is an example how to do this.

Comments

0

You can avoid creating strings for all lines and the array by creating the string for each line one at a time:

var eol = new[] { '\r', '\n' };

var pos = 0;
while (pos < input.Length)
{
    var i = input.IndexOfAny(eol, pos);
    if (i < 0)
    {
        i = input.Length;
    }
    if (i != pos)
    {
        var line = input.Substring(pos, i - pos);

        // process line
    }
    pos = i + 1;
}

Comments

0

On other hand, In this article say that the point is that "split" method is implemented poorly. Read it, and make your conclusions.

Like Attila said, you have to parse line by line.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.