0

BACKGROUND:

I have a Windows Service which is pulling records off of an SQL table (created with EF Code First Method). Records are being added very frequently (~10-20 per second) by 2 clients which are then peeled off the database and handled by my service. For redundancy, there are two clients monitoring the same systems and may be creating duplicate records. I am looking for a way to improve the performance of the program which is processing the new records.

PROBLEM:

Step 1: remove duplicate entries:

// get duplicate entries
var duplicateEntities = context.OpcTagValueLogs.GroupBy(x => new { x.OpcTagId, x.SourceTimeStamp }).SelectMany(x => x.OrderBy(y => y.Id).Skip(1));
foreach (var duplicateEntry in duplicateEntries)
{
   // remove duplicate entries
   context.OpcTagValueLogs.Remove(duplicateEntry );
}

Step 2: Get remaining log entries

var logs = context.OpcTagValueLogs.Include(x => x.OpcTag).Include(x => x.OpcTag.RuleSets).ToList();

Step 3: Check associated rules and perform events to handle the new values

I'm trying to optimize my program as much as possible because right now the windows service which is processing the data is barely running faster than records are being created. If the rate of record creation increases, I am worried that the service will be unable to keep up.

These are the only queries I am running (besides record creation) on this table. The table structure is:

  • [INT] Id (Primary Key, Clustered: Id)
  • [INT] OpcTagId (IX_OpcTagId)
  • [DATETIME] TimeStamp
  • [NVARCHAR(MAX)] Value
  • [INT] SourceTimeStamp (IX_SourceTimeStamp)
  • [NVARCHAR(MAX)] ClientCode
  • [NVARCHAR(MAX)] PriorValue

Is there some way I can modify my indices to improve the performance of my queries?

EDIT: This is how the logs are processed after the duplicates are removed:

foreach (var log in logs.ToList()) // because items will be removed from the list during the loop, it is important to update the list on
                {                                  // each iteration, hence the .ToList()

                    if (log.PriorValue == log.Value) // check to see if the prior value equals to new value
                    {                                // I am only interested in changing values, so delete the log entry
                        // remove the entity
                        _context.OpcTagValueLogs.Remove(log);
                        logs.Remove(log);
                        _context.SaveChanges();
                    }
                    else
                    {
                        // check rules and perform actions
                        var ruleSets = log.OpcTag.RuleSets.ToList();
                        foreach (var ruleSet in ruleSets)
                        {
                            if (ruleSet.CheckRule(log.PriorValue, log.Value))
                            {
                                // perform action
                                // convert source timestamp to datetime
                                DateTime srcTS = new DateTime(1970, 1, 1).AddSeconds(log.SourceTimeStamp);
                                var action = ActionFactory.CreateAction(ruleSet.Action, log.PriorValue, log.Value, log.OpcTag, srcTS);
                                action.Execute();
                            }
                        }

                        // remove entity from database
                        _context.OpcTagValueLogs.Remove(log);
                        _context.SaveChanges();
                        logs.Remove(log); // remove the entity from the local list as well                            
                    }
                }

EDIT 2: Current method

var ruleSets = _context.RuleSets.ToList(); // Get entire rulesets once into memory
                var logsLocal = logs.ToList(); // bring all the logs into local memory
                var maxIndex = logsLocal.Max(x => x.Id); // the last index of the local logs
                foreach (var log in logsLocal)
                {
                    if (log.PriorValue != log.Value)
                    {
                        foreach (var ruleSet in ruleSets.Where(x => x.OpcTagId == log.OpcTagId))
                        {
                            if (ruleSet.CheckRule(log.PriorValue, log.Value))
                            {
                                // perform action
                                var action = ActionFactory.CreateAction(ruleSet.Action, log.PriorValue, log.Value, log.OpcTag, srcTS);
                                action.Execute();
                            }
                        }
                    }
                }

                _context.OpcTagValueLogs.Where(x=>x.Id <= maxIndex).Delete(); // batch delete only the logs that were processed on this program loop

EDIT 3: The action object is produced by the static ActionFactory class based on the ruleSet.Action value.

public static Action CreateAction(ActionId pActionId, string pPrior, string pNew, OpcTag pOpcTag, DateTime pSourceTimestamp)
    {
        Action evt = null;
        switch (pActionId)
        {
            case ActionId.A1000: evt = new A1000(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
                break;
            case ActionId.A1001: evt = new A1001(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
                break;
            case ActionId.A1002: evt = new A1002(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
                break;
            case ActionId.A1003: evt = new A1003(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
                break;
            case ActionId.A1004: evt = new A1004(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
                break;
        }
        return evt;
    }

Each one of these actions represents a different machine event and could be several hundred lines of code each (which is why it has been omitted).

6
  • When you call ToList() you effectively get all the children (relation) therefore it's not lazy. Why are you doing ToList()? Also, for the first problem you should just use EF Extend library and do "bulk delete" on all duplicates instead of doing it one by one. Commented Jun 5, 2015 at 13:04
  • Thanks, I will check out that library. It was my understanding that by calling ToList I would be bringing the records into memory and reduce the number of subsequent queries to the database. So, I thought this would help performance... Commented Jun 5, 2015 at 13:17
  • Yes but it actually sucks because it pulls out everything, even the 9/10 things that you actually don't need. Don't worry and embrace lazy-loading, w/o ToList(); it will actually query the database only when/if data is used. Commented Jun 5, 2015 at 13:36
  • Okay I'll give it a try. I edited my question to explain how I am processing the records. Currently, I loop through the records and remove them individually on each iteration. Can I do a batch delete at the end of the loop without deleting any new records which have not been processed? Commented Jun 5, 2015 at 13:39
  • If you're using Sql Server, you can create an unique index with the IGNORE_DUP_KEY option. That way the engine itself will ignore the duplicates from being inserted, without needing to delete them afterwards and no changes to any existing systems. Commented Jun 5, 2015 at 16:03

2 Answers 2

1

Firstly, your loop is probably causing an N + 1 problem. You are looping and query items from the DB in the loop. What you want is, less I/O (calls to the DB) as that is an expensive operation. If you server is beefy enough, storing the objects in memory upfront is a better option. You could even start using caching techniques but that may be a bit advanced right now.

Here's the code (not tested) that I came up with to help try and resolve the N + 1 issue:

// check rules and perform actions
// Get entire rulesets once into memory
var ruleSets = OpcTagValueLogs.OpcTag.RuleSets.ToList(); 

foreach (var log in logs.ToList()) // because items will be removed from the list during the loop, it is important to update the list on
{                                  // each iteration, hence the .ToList()

    if (log.PriorValue != log.Value) // check to see if the prior value equals to new value
    {                                // I am only interested in changing values, so delete the log entry
        foreach (var ruleSet in ruleSets.Where(x => x.OpcTags.Logs.OpcTagValueLogs.OpcTagId == log.OpcTagId))
        {
            if (ruleSet.CheckRule(log.PriorValue, log.Value))
            {
                // perform action
                // convert source timestamp to datetime
                DateTime srcTS = new DateTime(1970, 1, 1).AddSeconds(log.SourceTimeStamp);
                var action = ActionFactory.CreateAction(ruleSet.Action, log.PriorValue, log.Value, log.OpcTag, srcTS);
                action.Execute();
            }
        }                      
    }

    // The below was common to both the if and else condition, hence it is moved at the end of the conditional
    // remove the entity
    _context.OpcTagValueLogs.Remove(log);
    logs.Remove(log);
}

// Call save changes once (less I/O)
_context.SaveChanges();

I don't know the class definitions so you will have to modify the code accordingly especially for foreach (var ruleSet in ruleSets.Where(x => x.OpcTags.Logs.OpcTagValueLogs.OpcTagId == log.OpcTagId)) line.

I've also refactored some of the common code since it did not make sense though you can correct that if you feel that it is wrong.

This is the framework that Stan was mentioning. It is a great framework to help optimize your use of EF.

Also, the best thing to determine what is happening is to run SQL Server Profiler when your service is running to find bad queries.

Sign up to request clarification or add additional context in comments.

4 Comments

So far, the best answer for me is a mix between this and Stan's response. Unfortunately, I can't entirely rely on LazyLoading because the multitude of calls to the database (mostly within the action.Execute(); methods) seem to cause the program to run much much slower. I have edited my question with the current state of my program.
I have edited my question to explain a little bit more about the Execute method. I'm trying to avoid posting thousands of lines of code. Originally this question was just about optimizing the database but as I have learned, the problem may be much bigger.
Well it seems that this method is a huge hit on the DB within the original loop giving a massive roundtrip trip to the db. Has the above helped improve performance? Or is it more or less the same? Within the ~1000 line function, how much of DB work are you doing? If there is a lot, you could look at moving that to a stored procedure.
The above has helped and I will look at employing some of the same techniques in the action methods. The service handling the new db records is staying well ahead of the clients creating the records. As long as there is no fatal error, and the service won't have to play catch up, I believe this would suffice. However, I will also do some more research on stored procedures to see if I can make use of them. Thanks!
1

I think there is a lot of time wasted doing all those checks and making multiple trips to the database, when it can be handled in a single transaction.

Couple of options. I know you're using EF, but does the option of using a Stored Procedure exist in your environment? If so, you could use a MERGE statement so you would only make one trip to the database.

Another option is to create an Extension Method on your DbContext that will act like an UPSERT (read: MERGE). I just came across this class that sets it up for you when looking to see if you can make EF do upserts.

https://gist.github.com/ondravondra/4001192

3 Comments

Skip the ORM altogether when it gets too much in the way. +1
I might need to see an example of what you're referring to. I don't have a ton of experience with stored procedures, but from my knowledge it would not be possible to perform the necessary actions with an SQL statement. There are several different "types" of information with several different sources being logged to the database. I need to get those records, check the source/type, verify some established rule is true (i.e. new value > prior value) and perform some action if it is. That action might be anything from creating a new record in a separate database to pushing alerts to web clients
An Upsert (ie: MERGE) will basically do an if/then and then do the appropriate actions under one transaction, instead of doing a check and then doing the logic server side. Basically it will keep that logic within the SQL statement. Unless I'm missing something, I don't see why that couldn't accomplish your goal. Look at the link and/or look at MERGE statements and see if that doesn't help, but it's a good solution for eliminating multiple calls to a database.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.