Replace values in a CSV string

Question

I have a list of products in comma separated fashion and since the item list was replaced with new product items, I am trying to modify this CSV list with new product item list.

create table #tmp 
(
  id    int identity(1,1) not null,
  plist varchar(max) null
);

create table #tmpprod 
(
  oldid int null,
  newid int null
);

insert into #tmp(plist) values
('10,11,15,17,19'),
('22,34,44,25'),
('5,6,8,9');

insert into #tmpprod(oldid, newid) values
(5,  109),
(9,  110),
(10, 111),
(15, 112),
(19, 113),
(30, 114),
(34, 222),
(44, 333);

I am trying to use a split fn to convert into rows and then replace these values and then convert columns to rows again. Is it possible in any other manner?

The output will be as:

id	newlist
1	`111,11,112,17,113`
2	`22,222,333,25`
3	`109,6,8,110`

Check out the solutions in this link: blogs.msdn.com/b/amitjet/archive/2009/12/11/… — sazh
– sazh, Commented Mar 6, 2012 at 1:39
@AaronBertrand - I think I have solved to order issue. Provided you consider the for xml path('') trick with order by "supported and guaranteed". — Mikael Eriksson
– Mikael Eriksson, Commented Mar 6, 2012 at 13:20
@Aaron...i was using sql server 2005...yours and Mikael's solution suits my need. — Ram
– Ram, Commented Mar 7, 2012 at 16:13

Community · Accepted Answer · 2021-01-18 12:30:34Z

4

Convert your comma separated list to XML. Use a numbers table, XQuery and position() to get the separate ID's with the position they have in the string. Build the comma separated string using the for xml path('') trick with a left outer join to #tempprod and order by position().

;with C as
(
  select T.id,
         N.number as Pos,
         X.PList.value('(/i[position()=sql:column("N.Number")])[1]', 'int') as PID
  from @tmp as T
    cross apply (select cast('<i>'+replace(plist, ',', '</i><i>')+'</i>' as xml)) as X(PList)
    inner join master..spt_values as N
      on N.number between 1 and X.PList.value('count(/i)', 'int')
  where N.type = 'P'  
)
select C1.id,
       stuff((select ','+cast(coalesce(T.newid, C2.PID) as varchar(10))
              from C as C2
                left outer join @tmpprod as T
                  on C2.PID = T.oldid
              where C1.id = C2.id
              order by C2.Pos
              for xml path(''), type).value('.', 'varchar(max)'), 1, 1, '')
              
from C as C1
group by C1.id

Try on SE-Data

edited Jan 18, 2021 at 12:30

CommunityBot

11 silver badge

answered Mar 6, 2012 at 13:05

Mikael Eriksson

139k22 gold badges223 silver badges293 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Aaron Bertrand Over a year ago

Couple of questions: (1) are you sure that the order by in the subquery guarantees the order that xml path will reconstruct the values? (2) does the fact that there are 18 1s, 15 2s, 10 3s, etc. in spt_values have any impact? There's also a type conversion warning in the plan, but I think that may happen for all plans that reference spt_values. Anyway, kudos, this is quite similar to my approach but vastly more efficient. Someday I'll learn the ins and outs of XML in SQL Server.

Mikael Eriksson Over a year ago

@AaronBertrand - I have no official documentation to back anything up. (1) I know it is not guaranteed when using string concatenation for variables like @S = @S + Col. In those cases I have seen the recommendation to use for xml with order by instead. (2) I don't think that would matter since I use where N.type = 'P' to get rid of the duplicates. I would however recommend to use a dedicated numbers table instead of master..spt_values. (3)I don't see the type conversion error. Don't know what that is.

Mikael Eriksson Over a year ago

@AaronBertrand (4) I think yours would be much more efficient if you did a group by id instead of distinct.

Mikael Eriksson Over a year ago

@AaronBertrand - For more info on getting the position of a node in XML you can have a look at this Connect item.

Mikael Eriksson Over a year ago

@AaronBertrand - Installed SQL Server 2012 and Tada... I too see the type conversion error. It has something to do with filtering on type = 'P' on master..spt_values.

|

Aaron Bertrand · Accepted Answer · 2022-01-23 19:32:31Z

3

Assuming SQL Server 2005 or better, and assuming order isn't important, then given this split function:

CREATE FUNCTION [dbo].[SplitInts]
(
   @List       VARCHAR(MAX),
   @Delimiter  CHAR(1)
)
RETURNS TABLE
AS
   RETURN ( SELECT Item FROM ( SELECT Item = x.i.value('(./text())[1]', 'int') 
    FROM 
    ( SELECT [XML] = CONVERT(XML, '<i>' + REPLACE(@List, @Delimiter, '</i><i>') 
              + '</i>').query('.') ) AS a CROSS APPLY [XML].nodes('i') AS x(i)
          ) AS y WHERE Item IS NOT NULL);
GO

You can get this result in the following way:

;WITH x AS
(
    SELECT id, item, oldid, [newid], rn = ROW_NUMBER() OVER
    (PARTITION BY id 
     ORDER BY PATINDEX('%,' + RTRIM(s.Item) + ',%', ',' + t.plist + ','))
    FROM #tmp AS t CROSS APPLY dbo.SplitInts(t.plist, ',') AS s
    LEFT OUTER JOIN #tmpprod AS p ON p.oldid = s.Item
)
SELECT id, newlist = STUFF((SELECT ',' + RTRIM(COALESCE([newid], Item)) 
    FROM x AS x2 WHERE x2.id = x.id
    FOR XML PATH(''), 
    TYPE).value(N'./text()[1]', N'varchar(max)'), 1, 1, '') 
FROM x GROUP BY id;

Results:

id	newlist
1	111,11,112,17,113
2	22,222,333,25
3	109,6,8,110

Note that the ROW_NUMBER() / OVER / PARTITION BY / ORDER BY is only there to try to coerce the optimizer to return the rows in that order. You may observe this behavior today and it can change tomorrow depending on statistics or data changes, optimizer changes (service packs, CUs, upgrade, etc.) or other variables.

Long story short: if you're depending on that order, just send the set back to the client, and have the client construct the comma-delimited list. It's probably where this functionality belongs anyway.

That said, in SQL Server 2017+, we can guarantee retaining the order by splitting with OPENJSON() and reassembling with STRING_AGG():

;WITH x AS 
(
  SELECT o.id, val = COALESCE(n.newid, p.value), p.[key] 
  FROM #tmp AS o CROSS APPLY 
    OPENJSON('["' + REPLACE(o.pList, ',', '","') + '"]') AS p
  LEFT OUTER JOIN #tmpprod AS n 
  ON p.value = n.oldid
)
SELECT id, newlist = STRING_AGG(val, ',')
  WITHIN GROUP (ORDER BY [key])
  FROM x GROUP BY id;

Example db<>fiddle

edited Jan 23, 2022 at 19:32

answered Mar 6, 2012 at 4:01

Aaron Bertrand

282k37 gold badges469 silver badges468 bronze badges

4 Comments

Ram Over a year ago

@Aaron...wasnt too paticular about the order in CSV list; also instead of creating separate function..went with Mikaels CTE approach...however will remember the issue with spt..values in SQL-2012, thanks for the solution though!

Aaron Bertrand Over a year ago

The issue happens in previous versions too, it's just that it is exposed through the execution plan in SQL Server 2012. That doesn't mean it's something to worry about, just better to use a dedicated numbers table or a function than an existing table with potential selectivity or other issues.

Ram Over a year ago

@Aaron...thanks again, will go with tally numbers table approach and avoid spt_values all together!

Aaron Bertrand Over a year ago

Or just a "numbers table" - not sure what tally has to do with it since they're not used just for tallying. It's a table filled with a series of numbers, hence a "numbers table"... sorry, just one of my peeves.

Peter · Accepted Answer · 2012-03-06 12:10:38Z

Thanks for this question - I've just learned something new. The following code is an adaptation of an article written by Rob Volk on exactly this topic. This is a very clever query! I won't copy all of the content down here. I have adapted it to create the results you're looking for in your example.

CREATE TABLE #nums (n INT)
DECLARE @i INT 
SET @i = 1
WHILE @i < 8000 
BEGIN
    INSERT #nums VALUES(@i)
    SET @i = @i + 1
END


CREATE TABLE #tmp (
  id INT IDENTITY(1,1) not null,
  plist VARCHAR(MAX) null
)

INSERT INTO #tmp
VALUES('10,11,15,17,19'),('22,34,44,25'),('5,6,8,9')

CREATE TABLE #tmpprod (
  oldid INT NULL,
  newid INT NULL
)

INSERT INTO #tmpprod VALUES(5, 109),(9, 110),(10, 111),(15, 112),(19, 113),(30, 114),(34, 222),(44, 333)

;WITH cte AS (SELECT ID, NULLIF(SUBSTRING(',' + plist + ',' , n , CHARINDEX(',' , ',' + plist + ',' , n) - n) , '') AS prod
    FROM #nums, #tmp
    WHERE ID <= LEN(',' + plist + ',') AND SUBSTRING(',' + plist + ',' , n - 1, 1) = ',' 
    AND CHARINDEX(',' , ',' + plist + ',' , n) - n > 0)
UPDATE t SET plist = (SELECT CAST(CASE WHEN tp.oldid IS NULL THEN cte.prod ELSE tp.newid END AS VARCHAR) + ',' 
            FROM cte LEFT JOIN #tmpprod tp ON cte.prod = tp.oldid
            WHERE cte.id = t.id FOR XML PATH(''))
FROM #tmp t WHERE id = t.id

UPDATE #tmp SET plist = SUBSTRING(plist, 1, LEN(plist) -1)
WHERE LEN(plist) > 0 AND SUBSTRING(plist, LEN(plist), 1) = ','

SELECT * FROM #tmp
DROP TABLE #tmp
DROP TABLE #tmpprod
DROP TABLE #nums

The #nums table is a table of sequential integers, the length of which must be greater than the longest CSV you have in your table. The first 8 lines of the script create this table and populate it. Then I've copied in your code, followed by the meat of this query - the very clever single-query parser, described in more detail in the article pointed to above. The common table expression (WITH cte...) does the parsing, and the update script recompiles the results into CSV and updates #tmp.

@peter..the solution is quite interesting...updating the CTE after population...also good article there...thanks!

Community · Accepted Answer · 2018-11-13 23:03:11Z

Adam Machanic's blog contains this posting of a T-SQL only UDF which can accept T-SQL's wildcards for use in replacement.

http://dataeducation.com/splitting-a-string-of-unlimited-length/

For my own use, I adjusted the varchar sizes to max. Also note that this UDF performs rather slowly, but if you cannot use the CLR, it may be an option. The minor changes I made to the author's code may limit use of this to SQL Server 2008r2 and later.

CREATE FUNCTION dbo.PatternReplace
(
   @InputString VARCHAR(max),
   @Pattern VARCHAR(max),
   @ReplaceText VARCHAR(max)
)
RETURNS VARCHAR(max)
AS
BEGIN
   DECLARE @Result VARCHAR(max) = ''
   -- First character in a match
   DECLARE @First INT
   -- Next character to start search on
   DECLARE @Next INT = 1
   -- Length of the total string -- 0 if @InputString is NULL
   DECLARE @Len INT = COALESCE(LEN(@InputString), 0)
   -- End of a pattern
   DECLARE @EndPattern INT

   WHILE (@Next <= @Len) 
   BEGIN
      SET @First = PATINDEX('%' + @Pattern + '%', SUBSTRING(@InputString, @Next, @Len))
      IF COALESCE(@First, 0) = 0 --no match - return
      BEGIN
         SET @Result = @Result + 
            CASE --return NULL, just like REPLACE, if inputs are NULL
               WHEN  @InputString IS NULL
                     OR @Pattern IS NULL
                     OR @ReplaceText IS NULL THEN NULL
               ELSE SUBSTRING(@InputString, @Next, @Len)
            END
         BREAK
      END
      ELSE
      BEGIN
         -- Concatenate characters before the match to the result
         SET @Result = @Result + SUBSTRING(@InputString, @Next, @First - 1)
         SET @Next = @Next + @First - 1

         SET @EndPattern = 1
         -- Find start of end pattern range
         WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) = 0
            SET @EndPattern = @EndPattern + 1
         -- Find end of pattern range
         WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) > 0
               AND @Len >= (@Next + @EndPattern - 1)
            SET @EndPattern = @EndPattern + 1

         --Either at the end of the pattern or @Next + @EndPattern = @Len
         SET @Result = @Result + @ReplaceText
         SET @Next = @Next + @EndPattern - 1
      END
   END
   RETURN(@Result)
END

Collectives™ on Stack Overflow

Replace values in a CSV string

4 Answers 4

7 Comments

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related