0

I want to serialize the DataTable to a List in C#. My code is taking around 10 minutes for 100K data to fetch and convert to a list. Previously tried with sequential way but it three times more time to process. So, I tried Parallel and it saved 1/3 of time. But, still, it's too slow.

If anyone can help to make it fast. Any help is appreciated.

public async Task<List<DataPointModel>> GetDetails(HashParameterModel parameterModel)
{
    List<DataPointModel> result = new List<DataPointModel>();
    try
    {
        var query = "USP_GetDetailsFromMetaData";
        using (var sqlConnection = new SqlConnection(connectionString))
        {
            using (var sqlCommand = new SqlCommand(query, sqlConnection))
            {
                sqlCommand.CommandType = CommandType.StoredProcedure;
                sqlCommand.Parameters.AddWithValue("@GroupID", parameterModel.GroupID);
                sqlCommand.CommandTimeout = 0;
                sqlConnection.Open();
                DataTable dataTable = new DataTable();
                DataSet dataSet = new DataSet();
                SqlDataAdapter da = new SqlDataAdapter
                {
                    SelectCommand = sqlCommand
                };
                da.Fill(dataSet);

                dataTable = dataSet.Tables[0];
                DataTable dataTable1 = dataSet.Tables[1];
                var questionList = dataTable1.AsEnumerable();
                if (dataTable.Rows.Count > 0)
                {
                    Parallel.ForEach(dataTable.AsEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = 5 }, row =>
                    {
                        DataPointModel model = new DataPointModel();
                        model.ID = Convert.ToInt32(row["ID"]);
                        model.GroupID = Convert.ToInt32(row["GroupID"]);
                        model.ProviderID = Convert.ToInt32(row["SalvageProviderID"]);
                        model.ClaimNumber = row["ClaimNumber"].ToString();
                        model.PolicyNumber = row["PolicyNumber"].ToString();
                        model.DataPoint1 = row["DataPoint1"].ToString();
                        model.DataPoint2 = row["DataPoint2"].ToString();
                        model.DataPoint3 = row["DataPoint3"].ToString();
                        model.DataPoint4 = row["DataPoint4"].ToString();
                        model.FirstName = row["FirstName"].ToString();
                        model.LastName = row["LastName"].ToString();
                        model.PrimaryDamage = row["PrimaryDamage"].ToString();
                        model.Type = row["TypeCode"].ToString();
                        model.LossDate = row["LossDate"].ToString();
                        model.QuestionList = (from p in questionList
                                              where p.Field<int>("ID") == model.ID
                                              select new QuestionResponseModel()
                                              {
                                                  QuestionID = p.Field<int>("QuestionID").ToString(),
                                                  Response = p.Field<string>("ResponseValue")
                                              }).ToList();
                        result.Add(model);
                    });
                }
            }
        }
    }
    catch (Exception ex)
    {
        throw ex;
    }

    return result;
}

The DataSet has two DataTable

DataTable dataTable = dataSet.Tables[0]; // Details
DataTable dataTable1 = dataSet.Tables[1]; // QUestionList

I think the time is consumed when its looping interlly for QuestionList, which may have around 120K rows. Any Suggestions

5
  • 2
    Skip the DataTable and create your objects directly from the DataReader. Commented Aug 10, 2021 at 17:50
  • 2
    Or use dapper to materialize your list directly from the query result Commented Aug 10, 2021 at 17:51
  • 1
    Don't read 120K datarows into memory. Commented Aug 10, 2021 at 18:37
  • What will you do with this data in the end? Commented Aug 10, 2021 at 19:24
  • Check if the stored procedure really only queries the fields you need and map them directly into your model. Does fetching the data from the database use more time then filling the objects? Commented Aug 10, 2021 at 20:53

2 Answers 2

1

An easy and quick to implement performance improvment would be to build a look up table from your questionList, and access this to fetch the question instead of doing this piece of code

model.QuestionList = (from p in questionList
                      where p.Field<int>("ID") == model.ID
                      select new QuestionResponseModel()
                      {
                          QuestionID = p.Field<int>("QuestionID").ToString(),
                          Response = p.Field<string>("ResponseValue")
                      }).ToList();

So add the following

var questionList = dataTable1.AsEnumerable();
//maybe add .AsParallel() - questionList.AsParallel().ToLookUp(...)
var questionLookUp = questionList.ToLookUp(x => x.Field<int>("ID"), x => new QuestionResponseModel() { QuestionID = x.Field<int>("QuestionID"), Response = p.Field<string>("ResponseValue") });

And than use it like this

model.QuestionList = questionLookUp[model.ID].ToList();

https://learn.microsoft.com/en-us/dotnet/api/system.linq.lookup-2

https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.tolookup

Code is untested, hope I didn't make to many errors.

Sign up to request clarification or add additional context in comments.

Comments

0

Are there duplicate model IDs in the Details table? If so this could help to avoid running the question list query multiple times:

                        model.QuestionList = getQuestions(model.ID);

Method:

    public Dictionary<int, List<QuestionResponseModel>> questionBuffer = new Dictionary<int, List<QuestionResponseModel>>();

    public List<QuestionResponseModel> getQuestions(int ID)
    {
        if (questionBuffer.ContainsKey(ID)) return questionBuffer[ID];

        List<QuestionResponseModel> questions =  (from p in questionList
                              where p.Field<int>("ID") == model.ID
                              select new QuestionResponseModel()
                              {
                                  QuestionID = p.Field<int>("QuestionID").ToString(),
                                  Response = p.Field<string>("ResponseValue")
                              }).ToList();
        questionBuffer.Add(ID, questions);
        return questions;
    }

3 Comments

So a look-up table with extra steps :)
Ha, i guess. Your post wasnt there when i started typing. But if the details table only had a small number of model ids this would avoid pulling the whole 120k rows into memory
They are already in memory, inside the datatable. Your solution has the draw-back of enumerating questionList multiple times, for each unique ID. It only gives a perfomance increase if ID get passed multiple times, since you are caching the result for each ID, though judging from the name ID it would be suprising this to be the case, and ID should be unique. The ToLookUp only enumerates the list once and builds the look up as it goes. Sure building the lookup once takes some time, but it should perform better than enumerating a 120k multiple times.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.