4

I am a software developer and I was recently approached by DBA to optimize the query that an app of mine is using. DBA reported that query takes about 50% of CPU and high I/O operations when it runs. The query is pretty straight forward and I am unsure how to optimize it.

Question 1: How can I optimize this query?

Question 2: is it even my job to do so, shouldn't DBA be more knowledgeable in this? Mind you we have no DB developers, just DBA and Software Developers.

DB has approximately 30-50 million of records, it is constantly maintained/monitored by DBA, but I am unsure how. Server is on a dedicated machine and is Microsoft SQL Server 2005 - 9.00.5057.00 (X64)

PS: Please do not provide ways to improve DB by structural changes, I know it's a bad design to have currency stored as varchar, but it is what it is, we can't change DB structure, only queries accessing it.

Thank you for any insight.

Query:

SELECT
    COALESCE(CAST([PH].[PAmount] AS decimal(15, 2)) + CAST([PH].[Fee] AS decimal(15, 2)), 0.0) AS [PayAmount],
    [PH].[PDate] AS [PayDate]
FROM [History] AS [PH] WITH (NOLOCK)
WHERE [PH].[PMode] IN ('C', 'P')
    AND [PH].[INNO] = 'XYZ'
    AND [PH].[PStatus] IN ('CONSERVED', 'EXPECTING', 'REFRIGERATED', 'POSTPONED', 'FILED')
    AND [PH].[Locked] = 1
    AND [PH].[PDate] >= 'Jan 1, 2015'
ORDER BY [PH].[PDate] ASC

Fields:

PAmount - non-clustered index, varchar(50)

Fee - not indexed, decimal(6,2)

PDate - clustered index, datetime

PMode - non-clustered index, varchar(5)

INNO - non-clustered index, varchar(50)

PStatus - non-clustered index, varchar(50)

Locked - not indexed, bit

Execution plan: SELECT---Compute Scalar---Filter---NestedLoops-|--Index Seek (Inner Join) | cost 0% Cost 0% Cost 0% Cost 0% | cost 4% |---Key Lookup Cost 96%

8
  • What indexes on the table? How many rows does the query return? What does the execution plan look like? Commented Aug 25, 2015 at 20:45
  • 1
    Short of some indexing there isn't a lot you can do here. I would however highly recommend you stop using NOLOCK. This is dealing with money and that hint means your results are not accurate. blogs.sqlsentry.com/aaronbertrand/bad-habits-nolock-everywhere Commented Aug 25, 2015 at 20:57
  • Just an fyi: Sql Server 2005 reaches end of life in less than a year. Commented Aug 25, 2015 at 21:35
  • @ Martin: Indexes were provided in the original post. I've edited with execution path. Please let me know what else you need. Thanks. Commented Aug 26, 2015 at 13:26
  • @ Sean: We've been strongly adviced to ALWAYS use NOLOCK. Maybe its a specifics of our DB - 99% of DB transactions is an INSERT. We almost never do an edit. NOLOCK prevents records from being locked if working in parallel on a heavy-load table. Commented Aug 26, 2015 at 13:27

5 Answers 5

3

It seems like you have a misconception about indexes. Indexes don't combine with each other, so it's not a question of having a column "indexed" or "not indexed". It's not good to have a separate index for individual columns. It's about having indexes with several columns that much up with individual queries. An index on a column won't help a query if it's still more efficient for the database to select on another column first.

I'm getting a little stale at this, but for this query I'd recommend an index that looks something like this:

CREATE NONCLUSTERED INDEX [ix_History_XXXXX] ON [History] 
(
    [INNO] ASC,
    [Locked] ASC,
    [PDate] ASC,
    [PMode] ASC
)
INCLUDE ( PStatus, PAmount, Fee)

You may want to swap around PDate, PMode, and PStatus, depending on their selectivity.

When building an index, you want to list the most specific items first. The general idea is that an index stores each successive item in order. With this index, rows for all of the XYZ values for INNO will be grouped together, and so the query engine can seek right to that section of the index. The next most specific column is Locked. Even though this is a bit value, because it is limited to exactly one value we are still able to seek directly to the one specific part of the index that will matter for the entire query. Again: I haven't had to do this kind of thing for a while, so you might do as well listed PMode here; I just don't recall whether the Sql Server query optimizer is smart enough to handle the two values in an efficient way.

From here on out the best option for the index depends on how much each of the query values limits the results. Since we're no longer able to get all of the results into one space, we're gonna have to scan the relevant parts of the index. My instinct here is to use the Date value next. This will allow the scan to walk the index starting with the first date that matches your result, and help it get the records in the correct order, but again: this is just my instinct. You may be able to do better by listing PMode or PStatus first.

Finally, the additional in the INCLUDES clause will allow you to entirely complete this query from the index, without actually going back to the full table. You use an INCLUDES clause rather than just appending the values to the query to avoid making Sql Server rebuild the index for updates to these columns. This is why PStatus, for example, probably should not be part of the main index, if the status is something that can change, and why you might be better off also leaving Locked out of the index. These are things you'll want to measure and test for yourself, though.

Sign up to request clarification or add additional context in comments.

8 Comments

I am not sure I understand half of what you wrote about indexes. Judging by that my SQL is mediocre at best. I was not the one to design indexes, I am not sure who did, might have been at a time when the whole system was small and nobody bothered or was knowledgeable enough to do it right. I cannot change indexes in production directly, but I CAN suggest to change it if I can prove its working in TEST. I will try to apply your indexes in TEST - I will copy table so I have 2 exactly the same tables with same data, then I apply your index to one of the tables and compare execution path.
I've created 2 tables with exactly the same structure and no data. One table has extra index - yours. Execution plan suggests no difference. Do I actually need data in it? I tried to Generate Scripts to replicate the table with data but it errored out, saying its too large or something like that.
A query on a single table with no data will never use an index.
OK, ill work on copying data to each table... this may take some time
You don't need all 50million-ish records. Just enough for an index to make a difference. A few thousand should do it.
|
1

I would simply create index on the following table:

CREATE NONCLUSTERED INDEX idx_History_Locked_PMode_INNO_PStatus_PDate_iPAmount_iFee
    ON dbo.History (Locked, PMode, INNO, PStatus, PDate)
    INCLUDE (PAmount, Fee)
WHERE Locked = 1;                -- This is optional, can reduce index size.

This should improve your current query. All conditions should be met here.

3 Comments

Order within the index matters. It seems odd to list Locked first, when I'd guess that particular records will lock and unlock fairly regularly. As a bit, it also has poor selectivity. Best to list items first that are both more specific and more stable.
Well, I don't like guessing, but I thought if row gets locked once, it's locked once and forever. I might be wrong. It's quite hard to judge what index would suit best when there's no example data, unfortunately. I agree that your index might work better. Perhaps it would be even better to add filter here: WHERE Locked = 1. This should reduce index size. Thanks.
The locked bit is more to indicate the last updated record within a chain of related records. Imagine an apple12345 record that goes from PLANTED status to SEEDLING to PLANT to HARVESTED. Every time a record for apple12345 goes into the next stage, a DB trigger sets LOCKED = 1 to the latest record, while setting it to 0 to any preceding related record.
1

I would see if I got better results with ISNULL instead of COALESCE.

The other thing is looking at the indexes. You listed the fields that are indexed. If those fields are covered by several indexes, I suggest making one good covering index for this query.

A covering index is one where all of the data needed by the query are contained in the index. If an index used by the query is not covering, then there is an extra trip (or trips) to the table to get the rest of the fields. It is more efficient if all of the data is right there in the query.

Check out these articles:

What are Covering Indexes and Covered Queries in SQL Server?

https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/

For the data that is not part of a join or in the where clause, you can use the include keyword. Included fields are not searchable parts of the index, but will save the trip to the database.

Try the the index below. All of the fields in the where clause are part of the searchable part of the index, and all of the returned fields that are not part of the where clause are included. You might need to play with the order after looking at the execution plan, but I took my best guess.

Create Nonclustered Index Ix_Ncl_History_CoveringBigSelect on History(PDate, PMode, INNO, PStatus, Locked) Include (PAmount, Fee)

Here is an article about included columns.

4 Comments

NO improvement If I remove COALESCE completely (no checking for ISNULL, just plain remove COALESCE). Each index is a separate one, there are other fields in the table and other indexes too, each independent of each other. I am not entirely sure what you mean one index or many and might try making one good covering index for this query, can you elaborate please?
You answered my question about one index or many. I was asking if the fields you listed as indexed were all in one index, or were there several different indexes. I will edit my answer with more info on covering indexes.
Thanks, your index is almost the same as Joel's. When I tested his index with 10K data, it seemed to be working better, but when i injected even more data, the query was again back to the same performance as the original. I can't figure out why. Is there something wrong with my test case?
Looking at Joes, the biggest different I see is that he has status as an included and I have it right in the index. I would try that. Then, as Joel suggested, play with the order in the index. Look at the execution plan with different configurations.
0

You are right, the query looks normal. It's a straight forward query, only with 'AND' clause and no "NOT NULL" constraint or joins or subselect. The conditions are mostly equal (only the date is relational). If the values in the conditions (like 'C', 'P', 1, 'XYZ', 'CONSERVED', etc.) are selective enough, than you (or the DBA) should define some indexes and the optimizer can use it. Ask the DBA to create an appropriate index for the table.

How many result lines are you expected to get? If there are a lot (e.g. >> 10,000), the ORDER BY clause could cost a lot.

1 Comment

The result is very small, should be under one hundred records. The problem that the table its pulling records from is large (20, 30, 40 million records, I don't know exactly, but its millions)
0

As you said, I assume you can do nothing to the db, include indexing and structural changes. So what about the client App environment, is it powerful enough to do client side calculation?

If answer is yes, I suggest to move the calculation to the client side:

  • Do not cast data type in the query, cast varchar to decimal consume CPU resources. So get the result directly and do the convert job in your app.
  • For the IO issue, try to remove IN condition because IN essentially is a "OR" condition. So split your query into small pieces use "=" condition and send to your app, use your client app to "Union" them.

1 Comment

Not casting did not seem to make a difference. Same thing with turning IN into a bunch of ORs. I can take calculation to the client, but pulling the fields separately does not improve query. Leaving them varchar and not casting doesn't improve it as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.