0

I have the following test string that shows up in a column of my table.

<B99_9>TEST</B99_9><LastDay>TEST</LastDay>

I want to remove the values between specific tags. For example:

REPLACE( <B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9> )

This should give the following output:

<B99_9></B99_9><LastDay>TEST</LastDay>

This kind of does what I need it to do, but what can I do for situations where I don't know what the value is going to be between the tags? Is there a way to implement some Regex in the Replace function's search so I can search for maybe just the beginning/end tags regardless of the value between them?

I understand this could probably work in a CLR function but I'm really curious to see if I can get this work just by using t-sql. Thanks in advance!

9
  • Does the column contain values that are valid XML fragments? Commented Jun 24, 2015 at 18:29
  • Does this help? stackoverflow.com/questions/21378193/… Commented Jun 24, 2015 at 18:34
  • Hi Martin, Yes, the column does contain valid XML fragments Commented Jun 24, 2015 at 18:35
  • 1
    Is the column of datatype XML? If so you can use XML methods on it i.sstatic.net/KVU9l.png Commented Jun 24, 2015 at 18:40
  • 1
    No, you just need the PATINDEX of the starting tag and the PATINDEX of the ending tag. Then you can use those indexes to create your SUBSTRINGs. Commented Jun 24, 2015 at 19:02

2 Answers 2

3

If you wont or maybe can't use a CLR due to security reasons, there is a simple way using a CTE inside standard t-sql.

Here is a complete example inclusive demo structure. You can run it on a whole table.

CREATE TABLE #dummyData(id int identity(1,1), teststring nvarchar(255))

INSERT INTO #dummyData(teststring)
VALUES(N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>')

DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

;WITH cte AS(
    SELECT id, STUFF(
                teststring,
                PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
                (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
                N''
            ) as teststring, 1 as iteration
    FROM #dummyData
    -- iterate until everything is replaced
    UNION ALL
    SELECT id, STUFF(
                teststring,
                PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
                (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
                N''
            ) as teststring, iteration+1 as iteration
    FROM cte
    WHERE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring) > 0
)
SELECT c.id, c.teststring 
FROM cte as c
-- Join to get only the latest iteration
INNER JOIN (
            SELECT id, MAX(iteration) as maxIteration
            FROM cte 
            GROUP BY id
        ) as onlyMax
    ON c.id = onlyMax.id
    AND c.iteration = onlyMax.maxIteration

-- Cleanup
DROP TABLE #dummyData

If you want to use the result of the CTE in an update. You can just replace the part after the CTE-definition with the following code:

UPDATE dd
SET teststring = c.teststring
FROM #dummyData as dd -- rejoin the base table for later update usage
INNER JOIN cte as c
        ON dd.id = c.id
-- Join to get only the latest iteration
INNER JOIN (
            SELECT id, MAX(iteration) as maxIteration
            FROM cte 
            GROUP BY id
        ) as onlyMax
    ON c.id = onlyMax.id
    AND c.iteration = onlyMax.maxIteration

If you don't want to run it on a complete table set, you can run the following code for a single variable:

DECLARE @string nvarchar(max) = N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>'
DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

WHILE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string) > 0 BEGIN
    SELECT @string = STUFF(
                    @string,
                    PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag),
                    (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',@string)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag)),
                    N''
                )
END

SELECT @string
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you so much Ionic, I'm still trying to figure out how all this magic works but your solution does exactly what I need for all occurrences of the tag! If I have other tags that I need to be scrubbed with a similar format (i.e. E11_01 and A11_02) I would only need to adjust the @starttag variable right?
No not just the starttag. You need to adjust both variables @startag and @endtag. :-) Both are needed to fix it. You can also work with an pattern.I designed the code to be flexible. You can try this too: @starttag nvarchar(10) = N'<[A-Z][0-9][0-9][_][0-9][0-9]>' The same for @endtag too (with the slash). This will match all tags with the same pattern like E11_01 and A11_02 etc. But not B99_9 (as it has just one number at the end, not two). But it's a benefit at all to use patterns like described above. :-)
Perfect! Just one more question, since this is my first time learning about CTE's I'm a little confused since most CTE examples I've seen are used in conjunction with a SELECT statement. Can this CTE query be used for updating a column within the #dummyData table? I would like to update the column with the last iteration's output rather than just selecting it. Thanks Ionic!
Yes there is no problem with that. You can use a CTE like every other set. This means you can use it in an UPDATE or an INSERT too. You can for example join the base table (#dummydata') after the CTE` definition with the CTE itself and use the output of the CTE as the SET criteria of your UPDATE. You cannot update the CTE itself, but use it's outcome for the the UPDATE. I'll update the answer above for an example.
I can't thank you enough Ionic, this was all extremely helpful!
|
3

There is no out of the box support for regex in SQL Server yet though it is easy to deploy CLR code that does this.

But you can use standard string functions for this case

DECLARE @T TABLE
  (
     X VARCHAR(MAX)
  );

INSERT INTO @T
VALUES      ('<B99_9>TEST</B99_9><LastDay>TEST</LastDay>'),
            ('Some prefix <B99_9>TEST</B99_9><LastDay>TEST</LastDay>')

UPDATE [@T]
SET    X = CASE
             WHEN 0 IN ( StartIndex, EndIndex )
                   OR EndIndex <= StartIndex THEN X
             ELSE STUFF(X, AdjStartIndex, EndIndex - AdjStartIndex, '')
           END
FROM   @T [@T]
       CROSS APPLY (VALUES ('<B99_9>','</B99_9>')) V(StartString, EndString)
       CROSS APPLY (VALUES (PATINDEX('%' + StartString + '%', X),PATINDEX('%' + EndString + '%', X))) V2(StartIndex, EndIndex)
       CROSS APPLY (VALUES (StartIndex + LEN(StartString))) V3(AdjStartIndex)

SELECT *
FROM   @T 

Though, as the string is XML, storing it in an XML column would allow you to update it using inbuilt, semantically aware, XML methods rather than string parsing.

DECLARE @T TABLE(X XML);

INSERT INTO @T 
VALUES ('<B99_9>TEST</B99_9><LastDay>TEST</LastDay>'),  
       ('Some prefix <B99_9>TEST</B99_9><LastDay>TEST</LastDay>')


UPDATE @T
SET  X.modify('
  replace value of (/B99_9/text())[1]
  with "" ')  

SELECT *
FROM @T

1 Comment

Hi Martin, this is awesome, thank you! Is there a way for the above update statement to look for all occurrences of B99_9? I'm able to change the first tag but if it shows up again, its value is still unchanged. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.