SQL Server T-SQL Query Optimization

Question

I am trying to optimize the following T-SQL query:

SELECT Person.*
FROM Person
WHERE ZipCode LIKE '123%'
AND City = 'Washington'
AND NumberOfHomes in (1, 2, 3)
AND
(
    EXISTS
    (
        SELECT * FROM House
        WHERE Person.ID = House.PersonID
        AND House.Type = 'TOWNHOUSE'
        AND House.Size = 'Medium'
    )
    OR
    EXISTS
    (
        SELECT * FROM Color
        WHERE Person.ID = Color.PersonID
        AND Color.Foreground IN ('Green', 'Blue', 'Purple')
    )
)

I'd greatly appreciate any response in optimizing the query.

In particular, is there a way to convert the query into a more efficient query using only a single SELECT statement without any of the inner SELECT statements?

Thanks!

cant say much without the actual execution plan for your query. one minor tip, for EXISTS you don't need to return all rows or columns, just return TOP 1 1 from your query EXISTS( SELECT TOP 1 1 FROM House...) — user2321864
– user2321864, Commented Sep 10, 2014 at 15:00
@user2321864 It doesn't matter what you put there. SQL Server doesn't care, it knows it is just looking for 1 row and then it can short circuit, and it knows it doesn't return any data. Want proof it doesn't matter? Replace * with 1/0. — Aaron Bertrand
– Aaron Bertrand, Commented Sep 10, 2014 at 15:18

Gordon Linoff · Accepted Answer · 2014-09-10 14:53:36Z

4

This is the query:

SELECT p.* 
FROM Person p
WHERE p.ZipCode LIKE '123%'  AND p.City = 'Washington' AND p.NumberOfHomes in (1, 2, 3) AND
      (EXISTS (SELECT *
               FROM House h
               WHERE p.ID = h.PersonID AND h.Type = 'TOWNHOUSE' AND h.Size = 'Medium'
             ) OR 
       EXISTS (SELECT *
               FROM Color c
               WHERE p.ID = c.PersonID AND c.Foreground IN ('Green', 'Blue', 'Purple')
              )
      );

Without rewriting the query, you can optimize this with indexes. I would recommend:

Person(City, ZipCode, NumberOfHomes, Id);
House(PersonId, Type, Size);
Color(PersonID, Foreground)

Question, though. Are you sure that the ids in theHouseandColortables really match back toPerson.Id? Normally, they would have a column called something likePersonId`.

answered Sep 10, 2014 at 14:53

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Wagner DosAnjos Over a year ago

I'm curious, perhaps you know the answer. Does EXISTS (SELECT * and EXISTS (SELECT 1 perform any different?

Gordon Linoff Over a year ago

@wdosanjos . . . No, the compiler changes both of these to the same code. Usually, I would use select 1, but I left the select * because that is how the OP phrased it.

Aaron Bertrand Over a year ago

@wdosanjos NO. Please see my comment above.

MelgoV · Accepted Answer · 2014-09-10 15:49:57Z

0

Please try this:

SELECT p.*
FROM Person p
WHERE Substring(Ltrim(Rtrim(p.ZipCode)),1,3) = '123' AND p.City = 'Washington'AND 
(p.NumberOfHomes=1 or  p.NumberOfHomes=2 or p.NumberOfHomes=3))
AND
(
EXISTS
(
    SELECT 1 FROM House h
    WHERE p.ID = h.PersonID
    AND h.Type = 'TOWNHOUSE'
    AND h.Size = 'Medium'
)
OR
EXISTS
(
    SELECT 1 FROM Color c
    WHERE p.ID = c.PersonID
    AND (c.Foreground ='Green' or c.Foreground='Blue' or  c.Foreground='Purple')
)
);

Also this will work better:

SELECT 
    p.*
FROM Person p
Left join House h
    On (p.Id=h.PersonID)
Left join Color c
    On (p.id=c.PersonID)
WHERE Substring(Ltrim(Rtrim(p.ZipCode)),1,3) = '123' AND p.City = 'Washington'AND 
(p.NumberOfHomes=1 or  p.NumberOfHomes=2 or p.NumberOfHomes=3)) and Isnull(h.Type,'') =   'TOWNHOUSE' AND Isnull(h.Size,'') = 'Medium' AND 
(Isnull(c.Foreground,'') ='Green' or Isnull(c.Foreground,'')='Blue' or Isnull(c.Foreground,'')='Purple') and 
(h.PersonID is not null or  c.PersonID is not null);

edited Sep 10, 2014 at 15:49

answered Sep 10, 2014 at 15:33

MelgoV

6568 silver badges22 bronze badges

4 Comments

MelgoV Over a year ago

¿Why the -1? is abuse

Peter Over a year ago

Hi, the queries are not any particularly better than any of the other queries. They all show similar timing information with the following settings: SET STATISTICS TIME ON & SET STATISTICS IO ON. Is there a way to get more accurate timing information to compare with other queries?

deroby Over a year ago

Same remark as for Sam Yi: converting WHERE EXISTS() to LEFT OUTER JOINs will POTENTIALLY return the same record doubled, tripled, etc... if there are multiple matches with the Color or House tables data. I'd also be surprised that SubString(1,3) will be faster than LIKE 'xyz%'. Rolling out the IN (,,) into OR's is done by the query optimizer anyway, personally I prefer the readability of the IN (,,) construction.

MelgoV Over a year ago

You are right about changing Exists() to Left join, I am sure substring(p.ZipCode,1,3) or Left(p.ZipCode,3) will work better than Like'%123' also the or works better than an in.

sam yi · Accepted Answer · 2014-09-10 18:46:14Z

0

Left join and checking for null will be quicker than doing existence checks. Also, if NumberofHomes is an integer, doing BETWEEN will be the same as IN.

SELECT p.*
FROM Person p
LEFT JOIN House h
    ON p.ID = h.PersonID
    AND h.Type = 'TOWNHOUSE'
    AND h.Size = 'Medium'
LEFT JOIN Color c
    ON p.ID = c.PersonID
    AND c.Foreground IN ('Green', 'Blue', 'Purple')
WHERE p.ZipCode LIKE '123%'
  AND p.City = 'Washington'
  AND p.NumberOfHomes BETWEEN 1 AND 3
  AND (h.PersonID is not null or c.PersonID is not null)

OR you can try something like this...

select t.* 
from (
    select personid from house
    where type = 'townhouse' and size = 'medium'
    union
    select personid from color
    where foreground in ('green','blue','purple')
) pid
cross apply (
    select *
    from person p
    where p.id = pid.personid
      and p.zipcode like '123%'
      and p.city = 'washington'
      and p.numberofhomes between 1 and 3
    ) t
where t.id is not null

It's really difficult to optimize these blind. Depending on the distribution of your data, the above query may give you better results.

edited Sep 10, 2014 at 18:46

answered Sep 10, 2014 at 15:15

sam yi

4,9451 gold badge33 silver badges41 bronze badges

6 Comments

ypercubeᵀᴹ Over a year ago

This is not an equivalent query. And I see no explanation about why it should be more efficient.

sam yi Over a year ago

Sorry... I got pull away from the desk.

Peter Over a year ago

Hi, the queries are not any particularly better than any of the other queries. They all show similar timing information with the following settings: SET STATISTICS TIME ON & SET STATISTICS IO ON. Is there a way to get more accurate timing information to compare with other queries?

deroby Over a year ago

Those queries will POTENTIALLY return the same record from p.* doubled, tripled, etc... if there are multiple matches with the Color or House tables data. Also, I'm not sure why EXISTS() has such a bad reputation; in my experience it performs just as good, and in some cases better, than using a LEFT OUTER JOIN approach.

deroby Over a year ago

@Peter if you prefer a GUI to compare the behavior and performance of queries on MSSQL I personally like SQL Sentry Plan Explorer a lot. Basically it's just the same information you'd get from the query-plan and the profiler in an (IMHO) easier to grasp presentation. (PS: I'm in no way affiliated with SqlSentry =)

|

Mark Anderson · Accepted Answer · 2014-09-10 15:54:22Z

-1

Often optimizing and having several different select statements are different topics as the query optimizer (SQL Server) often will take your sql statement and run it the way it sees to be the most efficient way it sees fit.

Saying that yes are several different ways you can take your statements and combine them into one sql statement here is an example. This will preserve your person table and get matches from House OR Color tables that match your criteria.

<!-- language:SQL-->
SELECT *
FROM Person Left Outer Join House ON Person.ID = House.PersonID Left Outer Join Color ON
Person.ID= Color.PersonID
WHERE (ZipCode LIKE '123%'
    AND City = 'Washington'
    AND Person.NumberofHomes in (1, 2, 3) )
    AND (
        House.Type = 'TOWNHOUSE'
        AND House.Size = 'Medium'
    )
    OR(
         Color.Foreground IN ('Green', 'Blue', 'Purple')
    )

I would recommend that you reconsider your model. For example, having PersonID in color is very suspect as is having numberofhomes (that could be possibly calculated for example, from a count on the House table that has the person's id). There are some other questionable normalization attributes as well. Not part of your question but I thought you might want to consider it.

answered Sep 10, 2014 at 15:54

Mark Anderson

112 bronze badges

6 Comments

Peter Over a year ago

The data model is correct but the names are made up, so it may appear a bit strange. Also, the queries are not any particularly better than any of the other queries. They all show similar timing information with the following settings: SET STATISTICS TIME ON & SET STATISTICS IO ON. Is there a way to get more accurate timing information to compare with other queries?

Mark Anderson Over a year ago

In regards to the data model. I understand, not knowing the context or experience of user and problem , I wanted to present the fact good modeling is key.

Mark Anderson Over a year ago

As I mentioned optimization and having a single query statement are really not always the same thing. As the query optimizer will more often than not take queries and execute them as it sees fit (based on an execution plan).

Mark Anderson Over a year ago

Certainly IO and time are important in optimization however, most dbas would be looking at the execution plan as the best indictor of optimization. Your best bet in comparing queries is to run them through the estimated query plan and have a look for items such as table scans(usually considered bad and are to be avoided). This will not only enable you to compare different sql scripts and see if things are optimized but also enable you to ensure that you and your sql scripts are using indexes appropriately.

Mark Anderson Over a year ago

Incidently, I ran your original query and the query above against unindexed tables and query optimizer took both and execute the same way (all table scans of course). I then put in place the indexes I thought to appropriate for these tables and ended up with all. I ended up with 3 index seeks. That's a nice improvement IMO.

|

Collectives™ on Stack Overflow

SQL Server T-SQL Query Optimization

4 Answers 4

3 Comments

4 Comments

6 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

4 Comments

6 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related