Remove string from SQL column

Question

I have a TEXT datatype field called "XMLText" in SQL Server 2012. What I'd like to do is remove any SageID fields. So this long string field contains something that looks like this:

<pair n="priorinstitution2" v="Yale School of Medicine" />
<pair n="priorinstitution3" v="" />
<pair n="sageid" v="20668528" />
<pair n="priorinstitution1" v="University of Chicago" />

What I'd like to do is remove everything for the SageID tag so that the final result is this:

<pair n="priorinstitution2" v="Yale School of Medicine" />
<pair n="priorinstitution3" v="" />
<pair n="priorinstitution1" v="University of Chicago" />

Obviously, it's not in a fixed position in the field and the v= could be any numbers or length. What's the SQL string manipulation to do this?

One technique is to use a recursive CTE with PATINDEX() and STUFF() to remove all occurrences. — Rabbit
– Rabbit, Commented Jan 8, 2016 at 18:37
Does your SQL column contain one occurence of sageid or multiple per row? Demo — Lukasz Szozda
– Lukasz Szozda, Commented Jan 8, 2016 at 18:40
Stop using TEXT datatype which will be removed in future Sql server versions. Use XML datatype to store xml data — Pரதீப்
– Pரதீப், Commented Jan 8, 2016 at 18:43
@TabAlleman You can use STUFF to replace a section of a string with a blank string. You just need to use PATINDEX or CHARINDEX to find the section of the string you want to remove. — Rabbit
– Rabbit, Commented Jan 8, 2016 at 18:45

Lukasz Szozda · Accepted Answer · 2016-01-08 19:17:08Z

4

TEXT is deprecated. Store your XML chunks as XML or NVARCHAR(MAX).

You can use xml.modify and delete to remove multiple occurences at once:

CREATE TABLE #tab(id INT, col TEXT);

INSERT INTO #tab(id, col)
VALUES 
(1, '<pair n="priorinstitution2" v="Yale School of Medicine" />
     <pair n="priorinstitution3" v="" />
     <pair n="sageid" v="20668528" />
     <pair n="priorinstitution1" v="University of Chicago" />')
,(2, '<pair n="sageid" v="2" y="adsadasdasd"/>
      <pair n="priorinstitution2" v="Yale School of Medicine" />
      <pair n="priorinstitution3" v="" />
      <pair n="sageid" v="20668528" />
      <pair n="priorinstitution1" v="University of Chicago" />
      <pair n="sageid" v="2066852832421432" z="aaaa" />');


SELECT *, xml_col = CAST(col AS XML)
INTO #temp
FROM #tab;

UPDATE #temp
SET xml_col.modify('delete /pair[@n="sageid"]');

UPDATE t1
SET col = CAST(t2.xml_col AS NVARCHAR(MAX))
FROM #tab t1
JOIN #temp t2
 ON t1.id = t2.id;

SELECT *
FROM #tab;

LiveDemo

Keep in mind that your XML data is not well-formed (no root element).

EDIT:

If your XML Text has different structure and you want to find all pair element with attribute n="sageid" use:

UPDATE #temp
SET xml_col.modify('delete //pair[@n="sageid"]');

LiveDemo2

edited Jan 8, 2016 at 19:17

answered Jan 8, 2016 at 18:59

Lukasz Szozda

181k26 gold badges278 silver badges326 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

techspider Over a year ago

this is nice solution

Lukasz Szozda Over a year ago

@techspider This will work, but OP should fix his schema.

techspider Over a year ago

true... that's the basic expectation... you did a great job, though!

Lukasz Szozda Over a year ago

@techspider Probably OP has some legacy code and needs to live with it. And that's why we need to develop workarounds to keep things working. And one day the technical debt be so high that it will have to collapse :)

Fuck StackOverflow Over a year ago

Did you misunderstand my comment? Or am I misunderstanding yours? I was trying to pay you a compliment. :) Working with the data as an actual XML data type and using .modify() is less error prone than trying to use STUFF with VARCHAR data (which was proposed elsewhere).

|

marc_s · Accepted Answer · 2016-01-08 19:27:13Z

1

Not a great solution but to find positions of start and end of tag and replace it with a blank string

UPDATE YourTable 
SET yourColumn = REPLACE(yourColumn, SUBSTRING(yourColumn, CHARINDEX('<pair n="sageid"', yourColumn), CHARINDEX('/>', yourColumn, CHARINDEX('<pair n="sageid"', yourColumn)) - CHARINDEX('<pair n="sageid"', yourColumn) + 2), '')

Adding below script for debugging

DECLARE @str AS VARCHAR(255) = '<pair n="priorinstitution2" v="Yale School of Medicine" /><pair n="priorinstitution3" v="" /><pair n="sageid" v="20668528" /><pair n="priorinstitution1" v="University of Chicago" />'

SELECT REPLACE(@str, SUBSTRING(@str, CHARINDEX('<pair n="sageid"', @str), CHARINDEX('/>', @str, CHARINDEX('<pair n="sageid"', @str)) - CHARINDEX('<pair n="sageid"', @str) + 2), '')

edited Jan 8, 2016 at 19:27

marc_s

760k186 gold badges1.4k silver badges1.5k bronze badges

answered Jan 8, 2016 at 18:43

techspider

3,40814 gold badges39 silver badges65 bronze badges

1 Comment

Juan Carlos Oropeza Over a year ago

would you consider split that big line in diferent parts?

Collectives™ on Stack Overflow

Remove string from SQL column

2 Answers 2

8 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related