0

So I've got a database which maintains all of the data in it in a history database, so that we can change the history date, and go back and look at old data. I need to write a query that adjusts the dates in these history tables for each table. Right now I've got it working as a cursor, but it takes several minutes to run, and I want to see if I can do it without a cursor.

Edit: To be clear, the primary keys that I'm pulling are the primary keys for the non-history tables. The history tables may have multiple entries for the single primary key. (Which is why the inner sql is doing the join that it is)

Here's the cursor:

DECLARE tableID CURSOR FOR
SELECT
OBJECT_NAME(ic.OBJECT_ID) AS TableName,
COL_NAME(ic.OBJECT_ID,ic.column_id) AS ColumnName
FROM sys.indexes AS i
INNER JOIN sys.index_columns AS ic
ON i.OBJECT_ID = ic.OBJECT_ID
AND i.index_id = ic.index_id
WHERE i.is_primary_key = 1
and COL_NAME(ic.OBJECT_ID, ic.column_id) != 'RecordID'

DECLARE @currentTable varchar(100)
DECLARE @currentID varchar(100)
DECLARE @currSql varchar(max)
OPEN tableID

FETCH FROM tableID
INTO @currentTable, @currentID
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @currSql = 
'update t1
set t1.EndDate = t2.BeginDate
from hist.' + @currentTable + ' t1 inner join hist.' + @currentTable + ' t2
on t1.' + @currentID + ' = t2.' + @currentID + '
and t2.BeginDate = (select MIN(BeginDate) from hist.' + @currentTable + ' t
where t.BeginDate >= t1.EndDate and t.' + @currentID + ' = t1.' + @currentID + ')'
EXEC(@currSql)
FETCH FROM tableID
INTO @currentTable, @currentID
END
CLOSE tableID
DEALLOCATE tableID
5
  • What I see is fine...the only other way I can think of doing this (without a cursor) would be to HARDCODE the update for each table. Commented Sep 22, 2011 at 21:37
  • That's what I was afraid of. It certainly works now, I was just hoping it could be done without a cursor. Commented Sep 22, 2011 at 21:43
  • you could attempt to optimize the query that is dynamically generated to speed things up. Note you have a subSelect that may be able to be factored out. I would pose your question like that. Commented Sep 22, 2011 at 21:47
  • Further to John's comments, I don't understand quite how this update is supposed to work. If the @currentID column is the column name that is the single-column primary key, why are there three self-joins in the update? How can a row with key = 1 reference any other row in the same table except that row? Also do you think it's safe to assume that all primary keys in the database will be single-column? Why doesn't the cursor restrict to objects in the hist schema? Commented Sep 22, 2011 at 22:05
  • I think you should make it clear in your question that the primary key of your hist tables is not the same as the primary key of the "original" tables. Commented Sep 23, 2011 at 13:33

1 Answer 1

2

I find it very hard to believe that this runs slowly because it's a cursor. You can make the cursor slightly more efficient by saying:

DECLARE CURSOR tableID LOCAL STATIC READ_ONLY FORWARD_ONLY FOR ...

...but I bet if you just print all those SQL commands, copy and paste them into a new window, and execute them manually, that it will still take a lot longer than you'd like. The speed is probably related to the amount of data you're updating (or at least scanning), not because you're using a cursor to generate the commands.

You can generate these commands without explicitly using a cursor, but rather using the metadata tables to build a string, but this will still really use a cursor in the engine... the code is just a lot tidier. I'll post an example shortly.

First, just adding a sample of what the output of your query currently looks like, for say the id column on table1. To help illustrate my comment and how it might be very hard for this update to ever affect any rows:

update t1
set t1.EndDate = t2.BeginDate
from hist.table1 t1 
inner join hist.table1 t2
on t1.id = t2.id
and t2.BeginDate = (select MIN(BeginDate) from hist.table1 t
where t.BeginDate >= t1.EndDate and t.id = t1.id);

Perhaps you meant a much simpler query, like:

update hist.table1 
set EndDate = BeginDate
where BeginDate >= EndDate;

Or perhaps you meant to reference some other table in the subquery?

Anyway assuming one of the above queries is really what you intend to execute, to generate the first query you could try:

DECLARE @sql NVARCHAR(MAX) = N'';

SELECT @sql += CHAR(13) + CHAR(10)
+ N'update t1
    set t1.EndDate = t2.BeginDate
    from hist.' + QUOTENAME(t.name) + ' AS t1 
    inner join hist.' + QUOTENAME(t.name) + ' AS t2
    on t1.' + QUOTENAME(c.name) + ' = t2.' + QUOTENAME(c.name) 
    + 'and t2.BeginDate = (select MIN(BeginDate) from hist.' 
    + QUOTENAME(t.name) + ' AS t where t.BeginDate > t1.EndDate and 
    t.' + QUOTENAME(c.name) + ' = t1.' + QUOTENAME(c.name) + ');'
FROM sys.tables AS t
INNER JOIN sys.indexes AS i
ON t.[object_id] = i.[object_id]
AND i.is_primary_key = 1
INNER JOIN sys.index_columns AS ic
ON t.[object_id] = ic.[object_id]
INNER JOIN sys.columns AS c
ON c.column_id = ic.column_id
AND c.[object_id] = ic.[object_id]
WHERE c.name <> 'RecordID'
AND t.[schema_id] = SCHEMA_ID('hist');

PRINT @sql;
-- EXEC sp_executesql @sql;

And for the second it is a lot simpler:

DECLARE @sql NVARCHAR(MAX) = N'';

SELECT @sql += CHAR(13) + CHAR(10) 
    + N'UPDATE hist.' + QUOTENAME(t.name) 
    + ' SET EndDate = BeginDate
    WHERE BeginDate > EndDate;' 
FROM sys.tables AS t
WHERE t.schema_id = SCHEMA_ID('hist');

PRINT @sql;
-- EXEC sp_executesql @sql;

Note that I changed the >= to > since if it's already = there's no reason to update. And again, these assume that everything is in the hist schema and all primary keys are single column primary keys. Though I will state again that the first, longer version of the query is much more expensive (two extra clustered index seeks and a very expensive table spool operator) - while not achieving results that are any different, whatsoever, from the shorter version I posted.

Sign up to request clarification or add additional context in comments.

2 Comments

Ah, the key that makes your second one not work is that the history tables have different primary keys than the normal tables. They all have RecordID as their primary key, and will have multiple rows for each of the primary keys of the original table, which is why I need to pick that, and use that.
So the first example should be changed from SCHEMA_ID('hist') to SCHEMA_ID('other_schema')...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.