Optimize SQL query, TSQL

Question

I am a software developer and I was recently approached by DBA to optimize the query that an app of mine is using. DBA reported that query takes about 50% of CPU and high I/O operations when it runs. The query is pretty straight forward and I am unsure how to optimize it.

Question 1: How can I optimize this query?

Question 2: is it even my job to do so, shouldn't DBA be more knowledgeable in this? Mind you we have no DB developers, just DBA and Software Developers.

DB has approximately 30-50 million of records, it is constantly maintained/monitored by DBA, but I am unsure how. Server is on a dedicated machine and is Microsoft SQL Server 2005 - 9.00.5057.00 (X64)

PS: Please do not provide ways to improve DB by structural changes, I know it's a bad design to have currency stored as varchar, but it is what it is, we can't change DB structure, only queries accessing it.

Thank you for any insight.

Query:

SELECT
    COALESCE(CAST([PH].[PAmount] AS decimal(15, 2)) + CAST([PH].[Fee] AS decimal(15, 2)), 0.0) AS [PayAmount],
    [PH].[PDate] AS [PayDate]
FROM [History] AS [PH] WITH (NOLOCK)
WHERE [PH].[PMode] IN ('C', 'P')
    AND [PH].[INNO] = 'XYZ'
    AND [PH].[PStatus] IN ('CONSERVED', 'EXPECTING', 'REFRIGERATED', 'POSTPONED', 'FILED')
    AND [PH].[Locked] = 1
    AND [PH].[PDate] >= 'Jan 1, 2015'
ORDER BY [PH].[PDate] ASC

Fields:

PAmount - non-clustered index, varchar(50)

Fee - not indexed, decimal(6,2)

PDate - clustered index, datetime

PMode - non-clustered index, varchar(5)

INNO - non-clustered index, varchar(50)

PStatus - non-clustered index, varchar(50)

Locked - not indexed, bit

Execution plan: SELECT---Compute Scalar---Filter---NestedLoops-|--Index Seek (Inner Join) | cost 0% Cost 0% Cost 0% Cost 0% | cost 4% |---Key Lookup Cost 96%

What indexes on the table? How many rows does the query return? What does the execution plan look like? — Martin Smith
– Martin Smith, Commented Aug 25, 2015 at 20:45
Short of some indexing there isn't a lot you can do here. I would however highly recommend you stop using NOLOCK. This is dealing with money and that hint means your results are not accurate. blogs.sqlsentry.com/aaronbertrand/bad-habits-nolock-everywhere — Sean Lange
– Sean Lange, Commented Aug 25, 2015 at 20:57
Just an fyi: Sql Server 2005 reaches end of life in less than a year. — Joel Coehoorn
– Joel Coehoorn, Commented Aug 25, 2015 at 21:35
@ Martin: Indexes were provided in the original post. I've edited with execution path. Please let me know what else you need. Thanks. — George
– George, Commented Aug 26, 2015 at 13:26
@ Sean: We've been strongly adviced to ALWAYS use NOLOCK. Maybe its a specifics of our DB - 99% of DB transactions is an INSERT. We almost never do an edit. NOLOCK prevents records from being locked if working in parallel on a heavy-load table. — George
– George, Commented Aug 26, 2015 at 13:27

Joel Coehoorn · Accepted Answer · 2015-08-26 03:58:03Z

3

It seems like you have a misconception about indexes. Indexes don't combine with each other, so it's not a question of having a column "indexed" or "not indexed". It's not good to have a separate index for individual columns. It's about having indexes with several columns that much up with individual queries. An index on a column won't help a query if it's still more efficient for the database to select on another column first.

I'm getting a little stale at this, but for this query I'd recommend an index that looks something like this:

CREATE NONCLUSTERED INDEX [ix_History_XXXXX] ON [History] 
(
    [INNO] ASC,
    [Locked] ASC,
    [PDate] ASC,
    [PMode] ASC
)
INCLUDE ( PStatus, PAmount, Fee)

You may want to swap around PDate, PMode, and PStatus, depending on their selectivity.

When building an index, you want to list the most specific items first. The general idea is that an index stores each successive item in order. With this index, rows for all of the XYZ values for INNO will be grouped together, and so the query engine can seek right to that section of the index. The next most specific column is Locked. Even though this is a bit value, because it is limited to exactly one value we are still able to seek directly to the one specific part of the index that will matter for the entire query. Again: I haven't had to do this kind of thing for a while, so you might do as well listed PMode here; I just don't recall whether the Sql Server query optimizer is smart enough to handle the two values in an efficient way.

From here on out the best option for the index depends on how much each of the query values limits the results. Since we're no longer able to get all of the results into one space, we're gonna have to scan the relevant parts of the index. My instinct here is to use the Date value next. This will allow the scan to walk the index starting with the first date that matches your result, and help it get the records in the correct order, but again: this is just my instinct. You may be able to do better by listing PMode or PStatus first.

Finally, the additional in the INCLUDES clause will allow you to entirely complete this query from the index, without actually going back to the full table. You use an INCLUDES clause rather than just appending the values to the query to avoid making Sql Server rebuild the index for updates to these columns. This is why PStatus, for example, probably should not be part of the main index, if the status is something that can change, and why you might be better off also leaving Locked out of the index. These are things you'll want to measure and test for yourself, though.

edited Aug 26, 2015 at 3:58

answered Aug 25, 2015 at 21:49

Joel Coehoorn

418k114 gold badges582 silver badges820 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

George Over a year ago

I am not sure I understand half of what you wrote about indexes. Judging by that my SQL is mediocre at best. I was not the one to design indexes, I am not sure who did, might have been at a time when the whole system was small and nobody bothered or was knowledgeable enough to do it right. I cannot change indexes in production directly, but I CAN suggest to change it if I can prove its working in TEST. I will try to apply your indexes in TEST - I will copy table so I have 2 exactly the same tables with same data, then I apply your index to one of the tables and compare execution path.

George Over a year ago

I've created 2 tables with exactly the same structure and no data. One table has extra index - yours. Execution plan suggests no difference. Do I actually need data in it? I tried to Generate Scripts to replicate the table with data but it errored out, saying its too large or something like that.

Joel Coehoorn Over a year ago

A query on a single table with no data will never use an index.

George Over a year ago

OK, ill work on copying data to each table... this may take some time

Joel Coehoorn Over a year ago

You don't need all 50million-ish records. Just enough for an index to make a difference. A few thousand should do it.

|

Evaldas Buinauskas · Accepted Answer · 2015-08-25 22:10:53Z

1

I would simply create index on the following table:

CREATE NONCLUSTERED INDEX idx_History_Locked_PMode_INNO_PStatus_PDate_iPAmount_iFee
    ON dbo.History (Locked, PMode, INNO, PStatus, PDate)
    INCLUDE (PAmount, Fee)
WHERE Locked = 1;                -- This is optional, can reduce index size.

This should improve your current query. All conditions should be met here.

edited Aug 25, 2015 at 22:10

answered Aug 25, 2015 at 21:50

Evaldas Buinauskas

14.2k11 gold badges63 silver badges119 bronze badges

3 Comments

Joel Coehoorn Over a year ago

Order within the index matters. It seems odd to list Locked first, when I'd guess that particular records will lock and unlock fairly regularly. As a bit, it also has poor selectivity. Best to list items first that are both more specific and more stable.

Evaldas Buinauskas Over a year ago

Well, I don't like guessing, but I thought if row gets locked once, it's locked once and forever. I might be wrong. It's quite hard to judge what index would suit best when there's no example data, unfortunately. I agree that your index might work better. Perhaps it would be even better to add filter here: WHERE Locked = 1. This should reduce index size. Thanks.

George Over a year ago

The locked bit is more to indicate the last updated record within a chain of related records. Imagine an apple12345 record that goes from PLANTED status to SEEDLING to PLANT to HARVESTED. Every time a record for apple12345 goes into the next stage, a DB trigger sets LOCKED = 1 to the latest record, while setting it to 0 to any preceding related record.

Community · Accepted Answer · 2017-05-23 11:52:11Z

1

I would see if I got better results with ISNULL instead of COALESCE.

The other thing is looking at the indexes. You listed the fields that are indexed. If those fields are covered by several indexes, I suggest making one good covering index for this query.

A covering index is one where all of the data needed by the query are contained in the index. If an index used by the query is not covering, then there is an extra trip (or trips) to the table to get the rest of the fields. It is more efficient if all of the data is right there in the query.

Check out these articles:

What are Covering Indexes and Covered Queries in SQL Server?

https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/

For the data that is not part of a join or in the where clause, you can use the include keyword. Included fields are not searchable parts of the index, but will save the trip to the database.

Try the the index below. All of the fields in the where clause are part of the searchable part of the index, and all of the returned fields that are not part of the where clause are included. You might need to play with the order after looking at the execution plan, but I took my best guess.

Create Nonclustered Index Ix_Ncl_History_CoveringBigSelect on History(PDate, PMode, INNO, PStatus, Locked) Include (PAmount, Fee)

Here is an article about included columns.

edited May 23, 2017 at 11:52

CommunityBot

11 silver badge

answered Aug 25, 2015 at 21:31

Sean

4922 silver badges13 bronze badges

4 Comments

George Over a year ago

NO improvement If I remove COALESCE completely (no checking for ISNULL, just plain remove COALESCE). Each index is a separate one, there are other fields in the table and other indexes too, each independent of each other. I am not entirely sure what you mean one index or many and might try making one good covering index for this query, can you elaborate please?

Sean Over a year ago

You answered my question about one index or many. I was asking if the fields you listed as indexed were all in one index, or were there several different indexes. I will edit my answer with more info on covering indexes.

George Over a year ago

Thanks, your index is almost the same as Joel's. When I tested his index with 10K data, it seemed to be working better, but when i injected even more data, the query was again back to the same performance as the original. I can't figure out why. Is there something wrong with my test case?

Sean Over a year ago

Looking at Joes, the biggest different I see is that he has status as an included and I have it right in the index. I would try that. Then, as Joel suggested, play with the order in the index. Look at the execution plan with different configurations.

dev.null · Accepted Answer · 2015-08-25 20:46:50Z

0

You are right, the query looks normal. It's a straight forward query, only with 'AND' clause and no "NOT NULL" constraint or joins or subselect. The conditions are mostly equal (only the date is relational). If the values in the conditions (like 'C', 'P', 1, 'XYZ', 'CONSERVED', etc.) are selective enough, than you (or the DBA) should define some indexes and the optimizer can use it. Ask the DBA to create an appropriate index for the table.

How many result lines are you expected to get? If there are a lot (e.g. >> 10,000), the ORDER BY clause could cost a lot.

answered Aug 25, 2015 at 20:46

dev.null

5382 silver badges7 bronze badges

1 Comment

George Over a year ago

The result is very small, should be under one hundred records. The problem that the table its pulling records from is large (20, 30, 40 million records, I don't know exactly, but its millions)

cqi · Accepted Answer · 2015-08-26 02:18:47Z

0

As you said, I assume you can do nothing to the db, include indexing and structural changes. So what about the client App environment, is it powerful enough to do client side calculation?

If answer is yes, I suggest to move the calculation to the client side:

Do not cast data type in the query, cast varchar to decimal consume CPU resources. So get the result directly and do the convert job in your app.
For the IO issue, try to remove IN condition because IN essentially is a "OR" condition. So split your query into small pieces use "=" condition and send to your app, use your client app to "Union" them.

answered Aug 26, 2015 at 2:18

cqi

5394 silver badges13 bronze badges

1 Comment

George Over a year ago

Not casting did not seem to make a difference. Same thing with turning IN into a bunch of ORs. I can take calculation to the client, but pulling the fields separately does not improve query. Leaving them varchar and not casting doesn't improve it as well.

Collectives™ on Stack Overflow

Optimize SQL query, TSQL

5 Answers 5

8 Comments

3 Comments

4 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

8 Comments

3 Comments

4 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related