1

I am having

  • a DataTable (columns are AccId and TerrName) which contains more than 2000 rows.
  • a large csv file (columns are AccId and External_ID) containing more than 6 millions records.

Now, I need to match AccId and have to find its corresponding External_ID from the csv file.

Currently I am achieving it using below code:

DataTable tblATL = Util.GetTable("ATL", false);
tblATL.Columns.Add("External_ID");

DataTable tbl = Util.CsvToTable("TT.csv", true);

foreach (DataRow columnRow in tblATL.Rows)
{
    var query = tbl.Rows.Cast<DataRow>().FirstOrDefault(x => x.Field<string>("AccId") == columnRow["AccId"].ToString());
    if (query != null)
    {
        columnRow["External_ID"] = query.Field<string>("External_ID");
    }
    else
    {
        columnRow["External_ID"] = "New";
    }
}

This code is working well but only problem is a performance issue, its taking very very long time to get the result.

Please help. How can I improve its performance, do you have any other approach?

3
  • can you give example headers of the csv file? eg fieldnames, their order/ type etc (holding 6M records in memory will always be slower) Commented Jun 15, 2016 at 13:52
  • If you are loading the entirety of the csv file in memory, PLinq is always an option. Commented Jun 15, 2016 at 13:54
  • @BugFinder: All columns are of string type without a specific order. AccId,External_ID 001P000000eHknBIAS,303363IN 001U000001bU0Q6IAK,303063IN Commented Jun 15, 2016 at 14:02

1 Answer 1

3

I suggest organizing data into a dictionary, say, Dictionary<String, String[]> which has O(1) time complexity, e.g.

  Dictionary<String, String[]> Externals = File
    .ReadLines(@"C:\MyFile.csv")
    .Select(line => line.Split(',')) // the simplest, just to show the idea
    .ToDictionary(
      items => items[0], // let External_ID be the 1st column
      items => items // or whatever record representation
    );

  ....

  String externalId = ...

  String[] items = Externals[externalId];

EDIT: if same External_ID can appear more than once (see comments below) you have to deal with duplicates, e.g.

 var csv =  File
   .ReadLines(@"C:\MyFile.csv")
   .Select(line => line.Split(',')) // the simplest, just to show the idea

 Dictionary<String, String[]> Externals = new Dictionary<String, String[]>();

 foreach (var items in csv) {
   var key = items[0]; // let External_ID be the 1st column
   var value = items;  // or whatever record representation

   if (!Externals.ContainsKey(key)) 
     Externals.Add(key, value);
   // else {
   //   //TODO: implement, if you want to deal with duplicates in some other way 
   //}
 }
Sign up to request clarification or add additional context in comments.

3 Comments

Let me implement it.
Currently I am facing issue with data, file can contain duplicate AccId with different External_Id, need to consider first occurrence of it. Dictionary throwing exception due to duplicate key as expected.
@Avijit: in that case you have to deal with duplicates (see my edit) and the simplest ToDictionary() will not do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.