0

I have a TEXT datatype field called "XMLText" in SQL Server 2012. What I'd like to do is remove any SageID fields. So this long string field contains something that looks like this:

<pair n="priorinstitution2" v="Yale School of Medicine" />
<pair n="priorinstitution3" v="" />
<pair n="sageid" v="20668528" />
<pair n="priorinstitution1" v="University of Chicago" />

What I'd like to do is remove everything for the SageID tag so that the final result is this:

<pair n="priorinstitution2" v="Yale School of Medicine" />
<pair n="priorinstitution3" v="" />
<pair n="priorinstitution1" v="University of Chicago" />

Obviously, it's not in a fixed position in the field and the v= could be any numbers or length. What's the SQL string manipulation to do this?

8
  • One technique is to use a recursive CTE with PATINDEX() and STUFF() to remove all occurrences. Commented Jan 8, 2016 at 18:37
  • Does your SQL column contain one occurence of sageid or multiple per row? Demo Commented Jan 8, 2016 at 18:40
  • @Rabbit You can use STUFF to remove string fragments? Commented Jan 8, 2016 at 18:40
  • 4
    Stop using TEXT datatype which will be removed in future Sql server versions. Use XML datatype to store xml data Commented Jan 8, 2016 at 18:43
  • @TabAlleman You can use STUFF to replace a section of a string with a blank string. You just need to use PATINDEX or CHARINDEX to find the section of the string you want to remove. Commented Jan 8, 2016 at 18:45

2 Answers 2

4

TEXT is deprecated. Store your XML chunks as XML or NVARCHAR(MAX).

You can use xml.modify and delete to remove multiple occurences at once:

CREATE TABLE #tab(id INT, col TEXT);

INSERT INTO #tab(id, col)
VALUES 
(1, '<pair n="priorinstitution2" v="Yale School of Medicine" />
     <pair n="priorinstitution3" v="" />
     <pair n="sageid" v="20668528" />
     <pair n="priorinstitution1" v="University of Chicago" />')
,(2, '<pair n="sageid" v="2" y="adsadasdasd"/>
      <pair n="priorinstitution2" v="Yale School of Medicine" />
      <pair n="priorinstitution3" v="" />
      <pair n="sageid" v="20668528" />
      <pair n="priorinstitution1" v="University of Chicago" />
      <pair n="sageid" v="2066852832421432" z="aaaa" />');


SELECT *, xml_col = CAST(col AS XML)
INTO #temp
FROM #tab;

UPDATE #temp
SET xml_col.modify('delete /pair[@n="sageid"]');

UPDATE t1
SET col = CAST(t2.xml_col AS NVARCHAR(MAX))
FROM #tab t1
JOIN #temp t2
 ON t1.id = t2.id;

SELECT *
FROM #tab;

LiveDemo

Keep in mind that your XML data is not well-formed (no root element).

EDIT:

If your XML Text has different structure and you want to find all pair element with attribute n="sageid" use:

UPDATE #temp
SET xml_col.modify('delete //pair[@n="sageid"]');

LiveDemo2

Sign up to request clarification or add additional context in comments.

8 Comments

this is nice solution
@techspider This will work, but OP should fix his schema.
true... that's the basic expectation... you did a great job, though!
@techspider Probably OP has some legacy code and needs to live with it. And that's why we need to develop workarounds to keep things working. And one day the technical debt be so high that it will have to collapse :)
Did you misunderstand my comment? Or am I misunderstanding yours? I was trying to pay you a compliment. :) Working with the data as an actual XML data type and using .modify() is less error prone than trying to use STUFF with VARCHAR data (which was proposed elsewhere).
|
1

Not a great solution but to find positions of start and end of tag and replace it with a blank string

UPDATE YourTable 
SET yourColumn = REPLACE(yourColumn, SUBSTRING(yourColumn, CHARINDEX('<pair n="sageid"', yourColumn), CHARINDEX('/>', yourColumn, CHARINDEX('<pair n="sageid"', yourColumn)) - CHARINDEX('<pair n="sageid"', yourColumn) + 2), '')

Adding below script for debugging

DECLARE @str AS VARCHAR(255) = '<pair n="priorinstitution2" v="Yale School of Medicine" /><pair n="priorinstitution3" v="" /><pair n="sageid" v="20668528" /><pair n="priorinstitution1" v="University of Chicago" />'

SELECT REPLACE(@str, SUBSTRING(@str, CHARINDEX('<pair n="sageid"', @str), CHARINDEX('/>', @str, CHARINDEX('<pair n="sageid"', @str)) - CHARINDEX('<pair n="sageid"', @str) + 2), '')

1 Comment

would you consider split that big line in diferent parts?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.