1

Having a problem with parsing a CSV file. I connect to the file using the following:

string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
          + "Data Source=\"" + dir + "\\\";"
          + "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
        //create the database query
        string query = "SELECT * FROM [" + file + "]";

        //create a DataTable to hold the query results
        DataTable dTable = new DataTable();


        //create an OleDbDataAdapter to execute the query
        OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);

        //Get the CSV file to change position.

        //fill the DataTable
        dAdapter.Fill(dTable);

        return dTable;

For some reason, the first column reads as a "Header" ok (i.e. HDR=Yes allows the values to be displayed). The problem is when I have HDR=No, nothing after the first 'cell' is displayed in that row. However I need to have HDR=No as I'll be writing the CSV later.

As a quick aside, the rest of the row only has a value in every other column. Also, there is a period in each of these columns. Any help?

Cheers.

EDIT: Here are a fake few lines similar to the CSV:

//Problem row->>    
File:,GSK1.D,,GSK2.D,,GSK3.D,
//The following rows, however, are fine:
 / 69,120.3,16.37%,128.9,7.16%,188.92,13.97%
D / 71,48.57,75.50%,32.15,26.65%,58.35,71.43%
T / 89,35.87,45.84%,50.01,28.87%,15.38,43.30%

EDIT: When I put any value into the "blank spaces" above they are parsed, but no matter what I put into the problematic cells (e.g. GSK1.D) they won't parse - unless it is a number! Is there any chance it is automatically converting this cell to a "float" cell? And how can I stop it doing this?

9
  • Any chance you can paste in the first few lines of the CSV ? Commented Nov 24, 2009 at 14:07
  • 1
    @David: Then paste in the first few lines of a dummy file. For Pete's sake, if you want help, cooperate! Commented Nov 24, 2009 at 14:16
  • Sounds like there is possibly some formatting issue with the file itself? Commented Nov 24, 2009 at 14:16
  • And while you're at it, I'd highly recommend you remove that try/catch block. You're not handling the error, you're just masking any potential problems. Commented Nov 24, 2009 at 14:18
  • @Ken I was in the middle of pasting a fake one when you made that comment, see above. Commented Nov 24, 2009 at 14:19

3 Answers 3

1

at Codeproject there is an parsing library: http://www.codeproject.com/KB/database/CsvReader.aspx

with an interesting article, how this stuff work. Its working faster (Author), than the OleDB Provider.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the link, but as mentioned in the comments in the previous answer I am most of the way there and would not like to rebuild now if possible.
You can create a DataTable easy from an Reader and i'm thinking therse also an table adapter you can fill DataSets.
1

I have finished this, just to let anyone know who may have this problem in the future. It turns out the reason there was nothing being taken in was because ADO tries to determine a column type. If other values in this column are not of said type, it removes them completely.

To counter this, you need to create a schema.ini file, like so:

StreamWriter writer = new StreamWriter(File.Create(dir + "\\schema.ini"));
writer.WriteLine("[" + fileToBeRead + "]");
writer.WriteLine("ColNameHeader = False");
writer.WriteLine("Format = CSVDelimited");
writer.WriteLine("CharacterSet=ANSI");

int iColCount = dTable.Columns.Count + 1;
for (int i = 1; i < iColCount; i++)
{
    writer.WriteLine("Col" + i + "=Col" + i + "Name Char Width 20");
}


//writer.WriteLine("Col1=Col1Name Char Width 20");
//writer.WriteLine("Col2=Col1Name Char Width 20");
//etc.

writer.Close();

Thanks for everyone's suggestions!

1 Comment

I just found your solution after coming to the same solution on my own project just now. +1 for wishing I had this an hour ago.
0

I've seldom done well with database type access to text files - the possibilities for "issues" with the file tend to exceed theoretical time savings.

Personally I've more often than not hand crafted the code to do this. A lot (going back over 20+ years so generic solutions have been thin on the ground). That said, if I were having to process a .csv file now the first thing I'd reach for would be FileHelpers or similar.

2 Comments

Murph, have you (or anyone else here) used FileHelpers? Would you recommend it for processing large (~10-50GB) delimited files that may possibly contain malformed records (e.g. quote mismatches, incorrect number of fields per record, etc)?
I have but not for that sort of file size, most of the processing I've done is of much smaller files and the malformed stuff I dealt with was in code that predates (my finding of file helpers) so I had my own code to deal with.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.