BACKGROUND:
I have a Windows Service which is pulling records off of an SQL table (created with EF Code First Method). Records are being added very frequently (~10-20 per second) by 2 clients which are then peeled off the database and handled by my service. For redundancy, there are two clients monitoring the same systems and may be creating duplicate records. I am looking for a way to improve the performance of the program which is processing the new records.
PROBLEM:
Step 1: remove duplicate entries:
// get duplicate entries
var duplicateEntities = context.OpcTagValueLogs.GroupBy(x => new { x.OpcTagId, x.SourceTimeStamp }).SelectMany(x => x.OrderBy(y => y.Id).Skip(1));
foreach (var duplicateEntry in duplicateEntries)
{
// remove duplicate entries
context.OpcTagValueLogs.Remove(duplicateEntry );
}
Step 2: Get remaining log entries
var logs = context.OpcTagValueLogs.Include(x => x.OpcTag).Include(x => x.OpcTag.RuleSets).ToList();
Step 3: Check associated rules and perform events to handle the new values
I'm trying to optimize my program as much as possible because right now the windows service which is processing the data is barely running faster than records are being created. If the rate of record creation increases, I am worried that the service will be unable to keep up.
These are the only queries I am running (besides record creation) on this table. The table structure is:
- [INT] Id (Primary Key, Clustered: Id)
- [INT] OpcTagId (IX_OpcTagId)
- [DATETIME] TimeStamp
- [NVARCHAR(MAX)] Value
- [INT] SourceTimeStamp (IX_SourceTimeStamp)
- [NVARCHAR(MAX)] ClientCode
- [NVARCHAR(MAX)] PriorValue
Is there some way I can modify my indices to improve the performance of my queries?
EDIT: This is how the logs are processed after the duplicates are removed:
foreach (var log in logs.ToList()) // because items will be removed from the list during the loop, it is important to update the list on
{ // each iteration, hence the .ToList()
if (log.PriorValue == log.Value) // check to see if the prior value equals to new value
{ // I am only interested in changing values, so delete the log entry
// remove the entity
_context.OpcTagValueLogs.Remove(log);
logs.Remove(log);
_context.SaveChanges();
}
else
{
// check rules and perform actions
var ruleSets = log.OpcTag.RuleSets.ToList();
foreach (var ruleSet in ruleSets)
{
if (ruleSet.CheckRule(log.PriorValue, log.Value))
{
// perform action
// convert source timestamp to datetime
DateTime srcTS = new DateTime(1970, 1, 1).AddSeconds(log.SourceTimeStamp);
var action = ActionFactory.CreateAction(ruleSet.Action, log.PriorValue, log.Value, log.OpcTag, srcTS);
action.Execute();
}
}
// remove entity from database
_context.OpcTagValueLogs.Remove(log);
_context.SaveChanges();
logs.Remove(log); // remove the entity from the local list as well
}
}
EDIT 2: Current method
var ruleSets = _context.RuleSets.ToList(); // Get entire rulesets once into memory
var logsLocal = logs.ToList(); // bring all the logs into local memory
var maxIndex = logsLocal.Max(x => x.Id); // the last index of the local logs
foreach (var log in logsLocal)
{
if (log.PriorValue != log.Value)
{
foreach (var ruleSet in ruleSets.Where(x => x.OpcTagId == log.OpcTagId))
{
if (ruleSet.CheckRule(log.PriorValue, log.Value))
{
// perform action
var action = ActionFactory.CreateAction(ruleSet.Action, log.PriorValue, log.Value, log.OpcTag, srcTS);
action.Execute();
}
}
}
}
_context.OpcTagValueLogs.Where(x=>x.Id <= maxIndex).Delete(); // batch delete only the logs that were processed on this program loop
EDIT 3: The action object is produced by the static ActionFactory class based on the ruleSet.Action value.
public static Action CreateAction(ActionId pActionId, string pPrior, string pNew, OpcTag pOpcTag, DateTime pSourceTimestamp)
{
Action evt = null;
switch (pActionId)
{
case ActionId.A1000: evt = new A1000(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
break;
case ActionId.A1001: evt = new A1001(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
break;
case ActionId.A1002: evt = new A1002(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
break;
case ActionId.A1003: evt = new A1003(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
break;
case ActionId.A1004: evt = new A1004(pActionId, pPrior, pNew, pOpcTag, pSourceTimestamp);
break;
}
return evt;
}
Each one of these actions represents a different machine event and could be several hundred lines of code each (which is why it has been omitted).
ToList()you effectively get all the children (relation) therefore it's not lazy. Why are you doingToList()? Also, for the first problem you should just useEF Extendlibrary and do "bulk delete" on all duplicates instead of doing it one by one.ToList();it will actually query the database only when/if data is used.