3

I have written a c# win forms application that allows the user to open a log (text) file and view the log lines in a data grid. The application formats that log data so that the user can filter, search etc.

The problem I have is that when the user opens a log file > 300mb the application throws an out of memory exception.

The app first loads all of the log lines into a string array, then it loops through the log lines, adding log entry objects to a list.

var allLogLines = File.ReadAllLines(logPath).ToList();
var nonNullLogLines = allLogLines.Where(l => !string.IsNullOrEmpty(l));

this.ParseLogEntries(nonNullLogLines.ToArray());

This initial step (loading the log data into a string array) uses up about 1gb of memory in task manager.

internal override void ParseLogEntries(string[] logLines)
{
    this.LogEntries = new List<LogEntry>();
    this.LogLinesCount = logLines.Count();

    for (int i = 0; i < this.LogLinesCount; i++)
    {
        int entryStart = this.FindMessageCompartment(logLines, i);
        int entryEnd = this.FindMessageCompartment(logLines, entryStart + 1);
        int entryLength = (entryEnd - entryStart) + 1;

        if (entryStart + entryLength > this.LogLinesCount)
        {
            entryLength = this.LogLinesCount - entryStart;
        }

        var logSection = new string[entryLength];

        Array.Copy(logLines, entryStart, logSection, 0, entryLength);
        Array.Clear(logLines, i, entryLength - 1);

        this.AddLogEntry(logSection);

        i = (entryEnd - 1);
    }
}

The AddLogEntry method addes a log entry to the list (LogEntries). The for loop manages to parse about 50% of the log file, then the out of memory exception occurs. At this point task manager reports that the application is using about 1.3gb of memory.

As you can see above I have added Array.Clear to null out the portion of the log data that has been successfully parsed, as a result I would expect that as objects are being added to the collection, the amount of memory (1gb to begin with) used by the large log data array would steadily reduce, but it does not. in fact this line makes no difference to the memory usage, even if I add a GC collect periodically.

Having read about LOH, I am assuming this is because the heap is not being compressed as portions of the large array are being nulled, so it always uses the same 1gb of memory despite its contents.

Is there any way I can reduce the amount of memory being held while the data is being parsed, or a possible rework that may make better use of memory? It seems strange to me that a 300mb text file, when put into a string array, consumes 1gb of memory?

Thanks.

6
  • 2
    What is FindMessageCompartment? Also do not use arrays, use generic List<string> Commented Jan 5, 2012 at 13:02
  • any probs doing ReadLine i.e reading line by line and processing file? Rather than loading it all at once. Commented Jan 5, 2012 at 13:05
  • Is this happening before you show you data, just during parsing? Commented Jan 5, 2012 at 13:08
  • How do you read you file are you using StramReader.ReadLine()? Commented Jan 5, 2012 at 13:09
  • 1
    Which version of .NET? .NET 4 provides efficient methods for file lines enumerating without fetching all lines in memory Commented Jan 5, 2012 at 13:10

5 Answers 5

4

Instead of your method ParseLogEntries(string[] logLines) that parses all the log lines in one go, you could instead have a ParseLogEntry(string logLine) method that parses a single line.

If you combine this with iterating over the lines in your log file one at a time (for instance by creating yourself an enumerator), this would avoid creating the big array string[] logLines in the first place.

One way could be like this:

static IEnumerable<string> ReadLines(string filename)
{
    using (TextReader reader = File.OpenText(filename))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

// And use the function somewhere to parse the log

var logEntries = new List<LogEntry>()
foreach (string line in ReadLines("log.txt"))
{
    logEntries.Add(ParseLogEntry(line));
}

If you're using .NET 4.0 or greater, you could of course just use the File.ReadLines Method as pointed out by sll in another answer, instead of creating your own method.

Sign up to request clarification or add additional context in comments.

2 Comments

And just to have mentioned it: the ReadLines method is something I picked up from the great book C# In Depth by the great Jon Skeet ;-)
I'm using .Net 3.5, the Enumerator looks like an ideal solution, many thanks.
1

I know this won't answer your question, but you might want to consider not entirely load your file into memory.

In your case your log file needs 300MB of memory, but what if it requires 2.5GB? Especially if the result is a display in a datagrid, you might want to use paging instead and load a small chunk of data from file every time you need it.

Comments

1

Strings require continuous memory segments on the heap; the application can throw "Out of Memory" some time when you have lots of long strings on the heap and you try to allocate another string but have no available segment of required length.

Your Array.Clear line may not help because the logSection string will not be garbage collected, in fact as the loop iterates, the run time will have a difficult time since it is harder to find a for example 10K space on the heap than finding 10 1K spaces.

That's what your problem is. As for the solution, in general I'd advice for a lazier solution. Do you really need all those strings in main memory? If yes, why don't you at least read from a StreamReader instead loading everything to string[] logLines?

Comments

0

First thing first that i can see is that you are reusing and doubling your memory usage by using statements like:

File.ReadAllLines(logPath).ToList();

The system will first read in all the lines, and then will convert it to a List which doubles the usage.

I would suggest that you read the file in via a streamreader using:

using(var sr = new StreamReader(fileName)) { // Get Data out here }

This way the memory is disposed of as soon as you move away from the statement.

Also Array.Copy is going to use more memory, so try to create and create your Desired object inside the Using statement or make your Objects IDisposable, so the GarbageCollector can save the day.

1 Comment

It doubles the memory usage for the references, but that's probably not much compared to the actual string data (which is not allocated again).
0

I would suggest do not load all file into memory and use lazy-reading. For >=.NET 4 you can leverage File.ReadLines() Method for reading file.

When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; Therefore, when you are working with very large files, ReadLines can be more efficient.

foreach (string line in File.ReadLines(@"path-to-a-file"))
{
   // single line processing logic
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.