complex SQL string parsing

Question

I have the following text field in SQL Server table:

1!1,3!0,23!0,288!0,340!0,521!0,24!0,38!0,26!0,27!0,281!0,19!0,470!0,568!0,601!0,2!1,251!0,7!2,140!0,285!0,11!2,33!0

Would like to retrieve only the part before the exclamation mark (!). So for 1!1 I only want 1, for 3!0 I only want 3, for 23!0 I only want 23.
Would also like to retrieve only the part after the exclamation mark (!). So for 1!1 I only want 1, for 3!0 I only want 0, for 23!0 I only want 0.

Both point 1 and point 2 should be inserted into separate columns of a SQL Server table.

You shouldn't be storing delimited values in a single column in the first place. — user330315
– user330315, Commented Jan 10, 2013 at 15:14
Is that entire string a single record, or is 1!1 a record, 3!0 another record, and so on? — EmmyS
– EmmyS, Commented Jan 10, 2013 at 15:15
I have a question: Do people also use the wrong end of the hammer to hit the nails and wonder why it is inefficient? Or is it just the DB topic that brings out this phenomenon? — ppeterka
– ppeterka, Commented Jan 10, 2013 at 15:17

BStateham · Accepted Answer · 2013-01-10 15:46:48Z

I LOVE SQL Server's XML capabilities. It is a great way to parse data. Try this one out:

--Load the original string
DECLARE @string nvarchar(max) = '1!2,3!4,5!6,7!8,9!10';

--Turn it into XML
SET @string = REPLACE(@string,',','</SecondNumber></Pair><Pair><FirstNumber>') + '</SecondNumber></Pair>';
SET @string = '<Pair><FirstNumber>' + REPLACE(@string,'!','</FirstNumber><SecondNumber>');

--Show the new version of the string
SELECT @string AS XmlIfiedString;

--Load it into an XML variable
DECLARE @xml XML = @string;

--Now, First and Second Number from each pair...
SELECT
  Pairs.Pair.value('FirstNumber[1]','nvarchar(1024)') AS FirstNumber,
  Pairs.Pair.value('SecondNumber[1]','nvarchar(1024)') AS SecondNumber
FROM @xml.nodes('//*:Pair') Pairs(Pair);

The above query turned the string into XML like this:

<Pair><FirstNumber>1</FirstNumber><SecondNumber>2</SecondNumber></Pair> ...

Then parsed it to return a result like:

FirstNumber | SecondNumber
----------- | ------------
          1 |            2
          3 |            4
          5 |            6
          7 |            8
          9 |           10

MarkD · Accepted Answer · 2013-01-10 15:36:41Z

I completely agree with the guys complaining about this sort of data. The fact however, is that we often don't have any control of the format of our sources.

Here's my approach...

First you need a tokeniser. This one is very efficient (probably the fastest non-CLR). Found at http://www.sqlservercentral.com/articles/Tally+Table/72993/

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;
GO

Then you consume it like so...

DECLARE @Wtf VARCHAR(1000) = '1!1,3!0,23!0,288!0,340!0,521!0,24!0,38!0,26!0,27!0,281!0,19!0,470!0,568!0,601!0,2!1,251!0,7!2,140!0,285!0,11!2,33!0'

SELECT   LEFT(Item, CHARINDEX('!', Item)-1)
        ,RIGHT(Item, CHARINDEX('!', REVERSE(Item))-1)
FROM [dbo].[DelimitedSplit8K](@Wtf, ',')

The function posted and logic for parsing can be integrated in to a single function of course.

EricZ · Accepted Answer · 2013-01-10 15:42:34Z

0

I agree to normaliz the data is the best way. However, here is the XML solution to parse the data

DECLARE @str VARCHAR(1000) = '1!1,3!0,23!0,288!0,340!0,521!0,24!0,38!0,26!0,27!0,281!0,19!0,470!0,568!0,601!0,2!1,251!0,7!2,140!0,285!0,11!2,33!0'
    ,@xml XML

SET @xml  = CAST('<row><col>' + REPLACE(REPLACE(@str,'!','</col><col>'),',','</col></row><row><col>') + '</col></row>' AS XML)

SELECT  
     line.col.value('col[1]', 'varchar(1000)') AS col1
    ,line.col.value('col[2]', 'varchar(1000)') AS col2
FROM    @xml.nodes('/row') AS line(col)

answered Jan 10, 2013 at 15:42

EricZ

6,2151 gold badge33 silver badges30 bronze badges

Collectives™ on Stack Overflow

complex SQL string parsing

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related