1

I am new to C# development. I need to parse a huge text file containing several lines of data per line. The output will be a CSV file.

The format of the file follows the following pattern:

Acronym: TIFFE 
Name of proposal: Thermal Systems Integration for Fuel Economy
Contract number: 233826
Instrument: CP – FP
#
Acronym: STREAMLINE
Name of proposal: Strategic Research For Innovative Marine Propulsion Concepts
Contract number: 233896
Instrument: CP – FP

where # stands for a new record. Now there are hundreds of 'records' in this textfile. I want to be able to parse everything to a CSV with columns for Acronym, Name of Proposal, etc. and the rows containing the actual data for each record.

Is there a best way how to attempt this?

I am guessing I have to parse the data into an intermediary - like a DataTable - before parsing it to CSV.

3 Answers 3

3

This simple LINQ statement parses your input file into a sequence of records and writes each record in CSV format to an output file (assuming that the number and order of fields in each record is the same):

File.WriteAllLines("output.csv", File
    .ReadLines("input.txt")
    .GroupDelimited(line => line == "#")
    .Select(g => string.Join(",", g
        .Select(line => string.Join(line
            .Substring(line.IndexOf(": ") + 1)
            .Trim()
            .Replace("\"", "\"\""), "\"", "\"")))));

Output:

"TIFFE","Thermal Systems Integration for Fuel Economy","233826","CP – FP"
"STREAMLINE","Strategic Research For Innovative Marine Propulsion Concepts","233896","CP – FP"

Helper method:

static IEnumerable<IEnumerable<T>> GroupDelimited<T>(
    this IEnumerable<T> source, Func<T, bool> delimiter)
{
    var g = new List<T>();
    foreach (var x in source)
    {
        if (delimiter(x))
        {
            yield return g;
            g = new List<T>();
        }
        else
        {
            g.Add(x);
        }
    }
    yield return g;
}
Sign up to request clarification or add additional context in comments.

2 Comments

I got the following error: 'System.Collections.Generic.IEnumerable<string>' does not contain a definition for 'GroupDelimited' and no extension method 'GroupDelimited' accepting a first argument of type 'System.Collections.Generic.IEnumerable<string>' could be found (are you missing a using directive or an assembly reference?) c:\users\user\documents\visual studio 2010\Projects\Fileparser\Fileparser\Program.cs
If that's a "simple" LINQ query, I'd hate to see a complicated one.
1

You don't necessarilly have to parse this to a DataTable first. You could StreamWrite your CSV directly out as you read the source file in. Obviously this is easier if the sequence and presence of fields in each record of the source is consistent.

But, for anything to do with CSVs you should consider using a specialised library. Like FileHelpers.

1 Comment

+1: A specialized library would correctly handle commas and double quote characters (and newlines if they appear in the data; if a double newline indicates a field separator as it appears in your example, the file format may support newlines as data).
0

You can use Linq to Text files and split the line on " : " to get two different columns.

Here is better explanation: http://schotime.net/blog/index.php/2008/03/18/importing-data-files-with-linq

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.