0

I need to improve the performance of my insert in my C# application. I first go out and get data from a view. Then I go through a FOREACH loop to insert into a table. I have over 200,000 records that I am working with and it takes an ridiculous amount of time to perform this task. I know the SaveChanges is a round trip to the database but I'm not sure how to get around this. Is there something I can do to improve the time?

            var values = db.TodaysAirs.ToList();
            foreach (TodaysAir x in values)
            {
                //check to see if this is a new value or one that needs to be updated
                var checkForNew = db.TodaysAirValues
                    .Where(m => m.ID == x.ID);

                //new record
                if (checkForNew.Count() == 0)
                {
                    TodaysAirValue newRecord = new TodaysAirValue();
                    newRecord.ID = x.ID;
                    newRecord.Logger_Id = x.Logger_Id;
                    newRecord.SiteName = x.SiteName;
                    newRecord.Latitude = x.Latitude;
                    newRecord.Longitude = x.Longitude;
                    newRecord.Hour = x.Hour;
                    newRecord.Parameter = x.Parameter;
                    newRecord.Stan = x.Stan;
                    newRecord.Units = x.Units;
                    newRecord.InstrumentType = x.InstrumentType;
                    newRecord.NowCast = x.NowCast;
                    newRecord.AQIValue = x.AQIValue;
                    newRecord.HealthCategory = x.HealthCategory;
                    newRecord.Hr24Avg = x.Hr24Avg;
                    newRecord.Hr24Max = x.Hr24Max;
                    newRecord.Hr24Min = x.Hr24Min;
                    newRecord.SID = DateTime.Now;

                    db.TodaysAirValues.Add(newRecord);
                    db.SaveChanges();
                  //  CallJenkinsJob();
                }
        }
8
  • 1
    Replace Where with Any and call SaveChanges outside the loop? Commented Apr 28, 2022 at 15:10
  • Also if you are able to use the Async version of the initial ToList Commented Apr 28, 2022 at 15:11
  • @Steve Actually you could just replace .Count() == 0 with Any() and get the same result. Commented Apr 28, 2022 at 15:11
  • @DStanley exactly, Any makes superfluos both Where and Count Commented Apr 28, 2022 at 15:12
  • 1
    Are the view and the target table in the same database? If so, this can be done entirely in a single SQL statement, with no C# involvement other than sending the one SQL command to the server, and I'd expect it to run in just a few minutes at most. Commented Apr 28, 2022 at 15:14

1 Answer 1

2

The goal should be to run a single raw SQL statement that will look something very much like this:

INSERT INTO TodaysAirValues
    (ID, Logger_id, SiteName, Latitude, Longitude, Hour, Parameter,
     Stan, Units, InstrumentType, NowCast, AQIValue, HealthCategory,
     Hr24Avg, Hr24Max, Hr24Min, SID)

SELECT ta.ID, ta.Logger_id, ta.SiteName, ta.Latitude, ta.Longitude,
       ta.Hour, ta.Parameter, ta.Stan, ta.Units, ta.InstrumentType,
       ta.NowCast, ta.AQIValue, ta.HealthCategory, ta.Hr24Avg,
       ta.Hr24Max, ta.Hr24Min, current_timestamp
FROM TodaysAirs ta
LEFT JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE tav.ID IS NULL

This might not have all the table or column names exactly right, if there are any differences from the EF mapping with the database. You might also get it to go even a little faster using NOT EXISTS() rather than the LEFT JOIN WHERE NULL technique.


I also see this:

if the Count is greater than 0 it checks to see if any changes where made and if so update the record.

In that case, you can still get this down to just two SQL commands if you precede (run this additional command first!) the INSERT above with an UPDATE that looks something like this:

UPDATE tav
   SET tav.ID=ta.DI, tav.Logger_id=ta.Logger_id, tav.SiteName=ta.SiteName,
       tav.Latitude=ta.Latitude, tav.Longitude=ta.Longitude, tav.Hour=ta.Hour, 
       tav.Parameter=ta.Parameter, tav.Stan=ta.Stan, tav.Units=ta.Units, 
       tav.InstrumentType=ta.InstrumentType, tav.NowCast=ta.NowCast,
       tav.AQIValue=ta.AQIValue, tav.HealthCategory=ta.HealthCategory,
       tav.Hr24Avg=ta.Hr24Avg,tav.Hr24Max=ta.Hr24Max, tav.Hr24Min=ta.Hr24Min,
       tav.SID=ta.SID -- possibly current_timestamp here instead
FROM TodaysAirs ta
INNER JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE (
    -- compare here to decide if the record needs to update or not
)

Unfortunately I don't have enough info about what you want that look like to give you the full code.

Sign up to request clarification or add additional context in comments.

2 Comments

Would you put this in a stored procedure that the application calls
You could, but you don't have to. You can put multiple SQL statements in the same string in C#, and send them all to the database to execute in the same call as part of the same batch. If you want to be really careful you could also include BEGIN TRANSACTION plus COMMIT in the C# string variable, but given the number of rows involved I might be more worried about the locking it would create.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.