0

I need to modify XML data stored in a SQL Server table containing approx 62m rows.

I am using the following query to do this and here is the query plan for it.

UPDATE c
SET     XMLContent.modify('delete //account/correspondence/contact[1]')
FROM   [Customer].[CorrespondenceLog] c

It has been running for over 24 hours and doesn't appear to be showing any progress. Is there a better method to modify XML in SQL Server (apart from obviously not storing XML in SQL Server! :))

Could an XML index help here?

3
  • How many records do you have? How big XML data are? Might suggest to do it in a loop, row by row in single transaction. Commented Dec 21, 2018 at 13:47
  • @SlavaMurygin 62m rows to update and the XML content is about 700 characters. Commented Dec 21, 2018 at 16:20
  • There are a lot of records. Take a look at your log file growth. Try to do updates in smaller batches. Also, crazy idea: Is it possible to convert XML to VARCHAR, do replacement there and then convert back? But I'm not sure if it will be faster. You can try. Commented Dec 22, 2018 at 2:51

1 Answer 1

2

If there's about 62 million rows, then updating all of them will obviously take some time.

But you could make the UPDATE query repeatable by adding a WHERE clause that checks if the tag exists in the XML field.
Then it won't update the XML if there's nothing to change in that XML.

And you don't actually need to use that FROM, although that probably won't make a difference performance wise.

UPDATE [Customer].[CorrespondenceLog]
SET 
 XMLContent.modify('delete //account/correspondence/contact[1]')
WHERE    XMLContent.exist('//account/correspondence/contact[1]') = 1

And index on another non-XML field could be usefull if you would do this in batches.
For example if there's an index (or partioning?) on some date field, and you update for ranges of dates.

Adding an XML index could help to some extend to find XML's that contain the tag.
But not sure it's worth it just to remove a tag.

CREATE PRIMARY XML INDEX PIdx_CorrespondenceLog_XMLContent 
ON [Customer].[CorrespondenceLog]([XMLContent]);

CREATE XML INDEX PIdx_CorrespondenceLog_XMLContent_PATH 
ON [Customer].[CorrespondenceLog]([XMLContent])
USING  XML INDEX PIdx_CorrespondenceLog_XMLContent FOR PATH; 

db<>fiddle here

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the suggestion, all the rows in the table have the tag and need it removing.
Without some criteria that would make use of an index it would do a full table scan anyway. That's why that suggestion of the batches. If that table has some index or a partition that's usable, then running several updates of smaller chunks together could perhaps take less time than 1 update of the whole table.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.