1

I have a T-SQL query and I want to make it faster.

I have Entity and Address tables, and wish to bring back an address if a mailing address exists.

Sometimes there are multiple addresses for any given entity. There is a primary mailing address tinyint that sometimes is set and sometimes not, there's no rules here there could be 5 default mailing addresses all the flag set or none with the flag set.

This runs at around 20 seconds for 11k rows I really need to get this time down, can anyone help?

SELECT 
   e.*, addr.*
FROM 
   [Entity] e
   --Address does not always exist
   --PrimaryAddress is a Not Null TinyInt, sometimes this flag is enable twice for a given entity.
LEFT OUTER JOIN 
   [Address] addr ON addr.[EntityID] = e.[EntityID] 
   AND addr.Code = 'MAILING'        
   AND addr.[AddressID] = (
       --This remove duplicates but add's a long delay(15 seconds) to execution time.
       SELECT Top 1 a.[AddressID]
       FROM [Address] AS a
       WHERE a.Code = 'MAILING'
         AND a.[EntityID] = e.[EntityID]    
       ORDER BY a.[PrimaryAddress] DESC)

It should also be noted that I can't add any indexes to the two tables either :(

Kind regards Simon Jackson

2
  • It's a 3rd party database and any modification is not "supported". Commented Oct 25, 2011 at 10:34
  • @marc_s, there are often many viable choices to performance tune without changing indexes. Commented Oct 25, 2011 at 13:37

3 Answers 3

1

This is a simplified version of your query that I think will return the same rows. (Not tested). I can't say if this will be faster than your version. You tell me.

SELECT 
    e.*,
    addr.*
FROM 
    [Entity] e
  OUTER APPLY (
                SELECT TOP(1) *
                FROM addr as a
                WHERE a.Code = 'MAILING'
                AND a.[EntityID] = e.[EntityID] 
                ORDER BY a.[PrimaryAddress] DESC
              ) as addr
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you this has improved things, noticeably the first time it runs it was about 14 seconds, second time round, down to 2 seconds.
@Simon : use DBCC FREEPROCCACHE and so on to cleanup cache before runs
DBCC FREEPROCCACHE, oh dear, 23 minutes and 20 seconds with the outer, I'll my original one now. There are a lot of layered views.
YOu have layered views? Oh dear, the vendor you bought this program from was incompetent weren't they? Have you considered buying a differnt program from a competent vendor?
OK, Here's the stats with this change using DBCC FREEPROCCACHE before each select. The original select, is consistently taking around 17 mins While the OUTER APPLY has cut it down to to an average of 8 seconds, it's a vast improvement.
|
1

You could stop using select *, you are returning the entity id twice and that is wasteful of both server and network resources. And do you honestly need every single one of the other fields? Eliminate any you don't need. Select * should not be used in production code anyway.

You have a correlated subquery which runs row by agonizing row, try using joins instead:

SELECT     e.*, addr.* 
FROM     [Entity] e     
LEFT JOIN   (SELECT addr.* 
            FROM  [Address] a
            JOIN     
                (SELECT Top 1 a.[AddressID]        
                FROM [Address] AS a        
                WHERE a.Code = 'MAILING'          
                AND a.[EntityID] = e.[EntityID]            
                ORDER BY a.[PrimaryAddress] DESC) dedup
                    ON a.address_id = dedup.address_id) addr 
    ON addr.[EntityID] = e.[EntityID] 

And again don't use select *, I don't know your fields or I would have specified them above.

Of course the real way to fix this is to fix the badly designed database. It should not allow more than one primary address (we enforce this through a trigger), then you wouldn't need the expensive remove duplicates task. I realize in your case this isn't possible, but it might make someone else think about their design flaw. Since this is a third party product, I would request that they fix it to allow only one primary address. Eventually if enough people complain, they might.

3 Comments

Thanks for the feedback I tested your joins, and it's taking 6 seconds on average :)
I only added the * to keep things simple and focus on the key fields. Even then, the table and field names used here do not reflect the real ones, if you saw what I was working with then I fear the answers would be about conventions rather than the issue. Thanks for your time and help.
I've marked this the answer as it offered the fastest performance increase. I do like @Mikael-Eriksson answer as well as its syntax is so simple, but it is a few seconds slower(in my query).
0

If you are on SQL Server 2005 or later version, you could try the following:

WITH ranked AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY EntityID ORDER BY [PrimaryAddress] DESC)
  FROM [Address]
  WHERE Code = 'MAILING'
)
SELECT
  e.*, a.*
FROM [Entity] e
  LEFT JOIN [Address] a ON a.[EntityID] = e.[EntityID] AND a.rn = 1

The result of this query would have one tiny difference over that of yours: there would be one additional column of rn with 1's and/or NULLs in it. I wouldn't consider it a problem, though, as masked SELECT lists are not recommended in production queries in the first place, and if that is a non-production script then one extra column will hardly be in the way.

References:

2 Comments

Or you could do this in a temp table instead of a CTE which can have the missing indexes put on it.
Tested this type of query, it got an average of 9 seconds. Thanks for sharing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.