Parsing csv with optional columns using FileHelper

Question

I am using Filehelper 3.1.5 to parse a CSV file but my problem that the CSV file should support many optional columns and I have not found out to configure FileHelper for this task.

Here is an example:

[DelimitedRecord(";")]
[IgnoreEmptyLines]
public class TestRecord
{
    //Mandatory
    [FieldNotEmpty]
    public string A;

    [FieldOptional]
    public string B;

    [FieldOptional]
    public string C;
}

I would like it to be possible to handle data like this:

A;C
TestA1;TestC1
TestA2;TestC1

But when I parse it, I will get "TestC1" as a result of records[1].B

var engine = new FileHelperEngine<TestRecord>();
var records = engine.ReadFile("TestAC.csv");

string column = records[1].C;
Assert.IsTrue(column.Equals("TestC1"));  //Fails, returns ""

column = records[1].B;
Assert.IsTrue(column.Equals("TestC1"));  //True, but that was not what I wanted

Thankful for any advice!

Have you configured it to read headers? I'm suspicious since you're using records[1] instead of records[0] — Rob
– Rob ♦, Commented Sep 8, 2015 at 7:01
Yes, I read the header line in records[0]. Maybe a check against the actual content of records[0] could show which columns are actually in the data? — L.Fuentes
– L.Fuentes, Commented Sep 8, 2015 at 7:55
Ok, by reading the the header line I can see that the test data contains columns A and C, but the value for C are mapped to method B, so I don't get any help from Filehelpers in this case. Is my best option to write my own mapping of column to member, i.e. that column "C" is mapped to member B? Or, maybe I am using the wrong library? — L.Fuentes
– L.Fuentes, Commented Sep 8, 2015 at 9:17

netniV · Accepted Answer · 2015-09-08 13:30:26Z

Tested against File Helpers Version 3.2.5

In order to make the FileHelper.Engine correctly identify your columns, you would have to dynamically remove the fields no longer in use. The following is based on your code with a few added bits and run from a console program:

        string tempFile = System.IO.Path.GetTempFileName();
        System.IO.File.WriteAllText(tempFile, @"A;C\r\n\TestA1;TestC1\r\nTestA2;TestC1");
        var engine = new FileHelperEngine<TestRecord>();
        var records = engine.ReadFile(tempFile, 1);

        // Get the header text from the file
        var headerFile = engine.HeaderText.Replace("\r", "").Replace("\n", "");

        // Get the header from the engine record layout
        var headerFields = engine.GetFileHeader();

        // Test fixed string against column as column could be null and Debug.Assert can't use .Equals on a null object!
        string column = records[0].C;
        Debug.Assert("TestC1".Equals(column), "Test 1 - Column C does not equal 'TestC1'");  //Fails, returns ""

        // Test fixed string against column as column could be null and Debug.Assert can't use .Equals on a null object!
        column = records[0].B;
        Debug.Assert(!"TestC1".Equals(column), "Test 1 - Column B does equal 'TestC1'");  //True, but that was not what I wanted

        // Create a new engine otherwise we get some random error from Dynamic.Assign once we start removing fields
        // which is presumably because we have called ReadFile() before hand.
        engine = new FileHelperEngine<TestRecord>();

        if (headerFile != headerFields)
        {
            var fieldHeaders = engine.Options.FieldsNames;
            var fileHeaders = headerFile.Split(';').ToList();

            // Loop through all the record layout fields and remove those not found in the file header
            for (int index = fieldHeaders.Length - 1; index >= 0; index--)
                if (!fileHeaders.Contains(fieldHeaders[index]))
                    engine.Options.RemoveField(fieldHeaders[index]);
        }

        headerFields = engine.GetFileHeader();
        Debug.Assert(headerFile == headerFields);

        var records2 = engine.ReadFile(tempFile);

        // Test fixed string against column as column could be null and Debug.Assert can't use .Equals on a null object!
        column = records2[0].C;
        Debug.Assert("TestC1".Equals(column), "Test 2 - Column C does not equal 'TestC1'");  //Fails, returns ""

        // Test fixed string against column as column could be null and Debug.Assert can't use .Equals on a null object!
        column = records2[0].B;
        Debug.Assert(!"TestC1".Equals(column), "Test 2 - Column B does equal 'TestC1'");  //True, but that was not what I wanted

        Console.WriteLine("Seems to be OK now!");
        Console.ReadLine();

Note: One important thing I found is that in the current version 3.2.5, removing a field after already reading the first line of a file will cause the engine to blow a fuse!

I also added a IgnoreFirst() attribute to your class so that it skips the header row and sets the text that is ignored into engine.HeaderText. This results in the following class:

    [DelimitedRecord(";")]
    [IgnoreEmptyLines]
    [IgnoreFirst()]
    public class TestRecord
    {
        //Mandatory
        [FieldNotEmpty]
        public string A;

        [FieldOptional]
        public string B;

        [FieldOptional]
        public string C;
    }

Your suggestion works fine. Thanks a lot for pointing out the RemoveField option! Remains to handle the order of columns, so I will test your suggestion below.
Cool. You should accept this as the answer then if you haven't already :)

netniV · Accepted Answer · 2015-09-08 13:31:53Z

0

I think you should decorate your columns with titles, such as:

[DelimitedRecord(";")]
[IgnoreEmptyLines]
public class TestRecord
{
    //Mandatory
    [FieldNotEmpty, FieldOrder(0), FieldTitle("A")]
    public string A;

    [FieldOptional, FieldOrder(1), FieldTitle("B")]
    public string B;

    [FieldOptional, FieldOrder(2), FieldTitle("C")]
    public string C;
}

This way, the runtime knows what the column names are and will parse them accordingly. Otherwise, all it knows is that you have two columns in the file and expects extra semi-colons. So, the following would have worked with your original setup:

A;;C TestA1;;TestC1 TestA2;;TestC1

This only works on FileHelpers v2 as v3 no longer has FieldTitle

edited Sep 8, 2015 at 13:31

answered Sep 8, 2015 at 10:36

netniV

2,4681 gold badge17 silver badges24 bronze badges

5 Comments

L.Fuentes Over a year ago

Thanks for your answer. So, my CSV cannot contain an optional number of columns according to your suggestion. Is FieldTitle still available in 3.1.5 by the way? it is not recognized by the compiler.

netniV Over a year ago

Oh good question, I was not at my computer when I tried it. I'll give it a full test later today. And I think if the title was used, yes you could have different columns. If you just specified a layout without headings, it assumed column order based on FieldOrder attribute.

netniV Over a year ago

Took a quick gander at the Git source and under v3 FieldTitle has been removed. So, I think the best bet you are going to have, is compare the field header and remove the fields you don't want to import. I'll post another answer on that front...

L.Fuentes Over a year ago

Thanks for your efforts, it is much appreciated. My goal is a flexible solution where only the column names are fixed but their existance optional as well as flexible (if possible) order.

netniV Over a year ago

You would need to amend the FieldOrder which you can do through the field indexes or by using TypeDescriptor to remove and add attributes at runtime. At the moment, my code below only checks for existence, not order.

Collectives™ on Stack Overflow

Parsing csv with optional columns using FileHelper

2 Answers 2

4 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related