0

I would like to create a functionality that works similar to SqlDataReader.Read()

I'm reading a flat-file from .txt/.csv and returning it as a datatable to my class handling business logic. This iterates through the rows of the datatable, and transforms the data, writing into a structured database. I use this structure for multiple import sources.

Large files though, work really, really slowly. It is taking me 2h to go through 30 MB of data, and I would like to get this down to 30 min. One step in this direction is to not read the entire file into a DataTable, but handle it line by line, and keep memory from getting klogged.

Something like this would be ideal: PSEUDOCODE.

FlatFileReader ffr = new FlatFileReader(); //Set FlatFileParameters
while(ffr.ReadRow(out DataTable parsedFlatFileRow))
{
     //...Business Logic for handling the parsedFlatFileRow
}

How can I implement a method that works like .ReadRow(out DataTable parsedFlatFileRow) ?


Is this the right direction?

foreach(obj in ff.lazyreading()){
    //Business Logic
} 

...

class FlatFileWrapper{

    public IEnumerable<obj> lazyreading(){
        while(FileReader.ReadLine()){ 
            yield return parsedFileLine; 
        }
    } 
}
3
  • 1
    FileHelpers may be a good option for you: filehelpers.sourceforge.net Commented Oct 30, 2013 at 9:44
  • 3
    You probably have not determined the cause for the bad performance yet. You gave guessed that it is memory usage, but I'm very suspicious of that. Profile the app, or pause the debugger 10 times to see where it stops most often. Commented Oct 30, 2013 at 9:51
  • No you're right - I didn't profile it yet. But memory usage is a known problem, on this SQL Server, and particularly has been noticed in connection with working through large files. So keeping memory use low is a priority in itself. Commented Oct 30, 2013 at 9:55

2 Answers 2

1

As Tim already mentioned, File.ReadLines is what you need:

"When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned"

You can create a parser that uses that method, something like this:

// object you want to create from the file lines.
public class Foo
{
    // add properties here....
}

// Parser only responsibility is create the objects.
public class FooParser
{
    public IEnumerable<Foo> ParseFile(string filename)
    {
        if(!File.Exists(filename))
            throw new FileNotFoundException("Could not find file to parse", filename);

        foreach(string line in File.ReadLines(filename))
        {
            Foo foo = CreateFoo(line);

            yield return foo;
        }
    }

    private Foo CreateFoo(string line)
    {
        // parse line/create instance of Foo here

        return new Foo {
            // ......
        };
    }
}

Using the code:

var parser = new FooParser();

foreach (Foo foo in parser.ParseFile(filename))
{
     //...Business Logic for handling the parsedFlatFileRow
}
Sign up to request clarification or add additional context in comments.

2 Comments

thx, I think I understood the concept, will try it out and post results. And this allows me to do lazy loading of the lines in the file, with memory only having the data of the current line?
Yes, check the link in my answer: The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient
0

You can use File.ReadLines which works similar to a StreamReader:

foreach(string line in File.ReadLines(path))
{
     //...Business Logic for handling the parsedFlatFileRow
}

7 Comments

I'd like to take the file-reading code into its own class. So I'm looking for a wrapper to the code you posted, which returns a Row from the File, and the next time the method is called, returns the next Row, until the entire file has been read... without loading the entire File into memory.
@RafaelCichocki: File.ReadLines is what you need. From MSDN: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned. See msdn.microsoft.com/en-us/library/dd383503.aspx
@RafaelCichocki: You could wrap the importer in a class that wraps a StreamReader (remember to implement IDisposable to dispose it) and yield return to return your objects lazily. stackoverflow.com/a/286553/284240
@RafaelCichocki This is indeed the method you need. But you probably want to look a C# iterators, i.e. using the yield return keyword.
@TimSchmelter: maybe you can edit your answer to include a full method instead of just a foreach loop, using the yield return keyword? I think it would be much easier for the OP to understand
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.