6

I need to extract a text that is surrounded by ***[some text] strings, like in the following example:

some text
some text
***[some text]
THIS SHOULD BE EXTRACTED
***[some text]
some text
some text
some text
some text
some text
***[some text]
THIS SHOULD BE EXTRACTED TOO
***[some text]
some text

the output should be:

THIS SHOULD BE EXTRACTED
THIS SHOULD BE EXTRACTED TOO

I tried PATINDEX like here, but couln't find the way to extract the string.

PATINDEX('%[*][*][*][[]%]%%[*][*][*][[]%]%',@Text)

I am looking forward to hearing any suggestions.

6
  • 3
    Does the text you show represent a field, a row or a row set? Commented Apr 27, 2011 at 12:18
  • @Cos Callis It represents one NVARCHAR field Commented Apr 27, 2011 at 12:23
  • Got a CR LF on the end of those lines? Commented Apr 27, 2011 at 12:33
  • @DKnight theres char(10) character, but any example working for single-line text is welcome Commented Apr 27, 2011 at 12:50
  • And the start section delimiter is really exactly the same as the end section delimiter? Commented Apr 27, 2011 at 12:55

5 Answers 5

4

For the somewhat easier case raised in the comments you could do

;WITH T(C) AS
(
 SELECT '
    some text
    some text
    ***[some text 1]
    THIS SHOULD BE EXTRACTED
    ***[some text 2]
    some text
    some text
    some text
    some text
    some text
    ***[some text 1]
    THIS SHOULD BE EXTRACTED TOO
    ***[some text 2]
    some text'
)
SELECT col.value('.','varchar(max)')
FROM T
CROSS APPLY (SELECT CAST('<a keep="false">' + 
                        REPLACE(
                            REPLACE(C,'***[some text 2]','</a><a keep="false">'),
                        '***[some text 1]','</a><a keep="true">') + 
                    '</a>' AS xml) as xcol) x
CROSS APPLY xcol.nodes('/a[@keep="true"]') tab(col)
Sign up to request clarification or add additional context in comments.

2 Comments

I tested your solution vs my which is above. It takes 98% of batch. My just only 2%. I think it is because of XML.
Yes the XML reader functions definitely don't have great performance. Additionally my answer would need some more REPLACE operations in there if the text might contain characters such as < in order to replace them with the corresponding XML entities.
2

Not a regex solution and I'm still a SQL novice so may not be optimal but you should be able to parse with a WHILE loop using

CHARINDEX for the *** then using that as a starting point to
CHARINDEX to the LF Use that as the starting point for a
SUBSTRING with the ending point being a CHARINDEX of the next ***
concatenate the Substring to your output, move past the ending *** and loop to find the next one.

I'll play with it some and see if I can add an example.
EDIT - This probably needs more error checking

declare @inText nvarchar(2000) = 'some text 
some text 
***[some text] 
THIS SHOULD BE EXTRACTED 
***[some text] 
some text 
some text 
some text 
some text 
some text 
***[some text] 
THIS SHOULD BE EXTRACTED TOO 
***[some text] 
some text '

declare @delim1 nvarchar(50) = '***'
declare @delim2 char = char(10)
declare @output nvarchar(1000) = ''
declare @position int
declare @positionEnd int

set @position = CHARINDEX(@delim1,@inText)
while (@position != 0 and @position is not null)
BEGIN
  set @position = CHARINDEX(@delim2,@inText,@position)
  set @positionEnd = CHARINDEX(@delim1,@inText,@position)
  set @output = @output + SUBSTRING(@inText,@position,@positionEnd-@position)
  set @position = CHARINDEX(@delim1,@inText,@positionEnd+LEN(@delim1))
END
select @output

2 Comments

I think you could probably use this as a base then call the UDF using CROSS APPLY
@Martin - oooo more reading material - thanks for the feedback it is very much appreciated
2

You can find this in my blog: http://sql-tricks.blogspot.com/2011/04/extract-strings-with-delimiters.html It is pure solution with no additional modification, only delimiters sequences should be decalred.

2 Comments

Thanks your solution is nice but i get Invalid length parameter passed to the SUBSTRING function when I try to change delimiters to meet my needs. Also it is a bit slow or my server is slow, I am not sure.
Please write your delimiters. I will fix it.
2

I may be wrong but I don't think there's a clean way to do this directly in SQL. I would use a CLR stored procedure and use regular expressions from C# or your .NET language of choice.

See this article (or this article) for a relevant example using regexes.

3 Comments

Unfortunately I cannot use C# or CLR stored procedures, I can perform only select statements...
The link of the article msdn.microsoft.com/en-us/magazine/cc163473.aspx is broken :(
@RobertLujo I found another example here at blogs.msdn.microsoft.com/sqlclr/2005/06/29/…
0

I believe you can use the xp_regex_match as described in http://www.codeproject.com/KB/mcpp/xpregex.aspx?q=use+sql+function+to+parse+text to parse your nvarchar field. I wrote something similar quite a while back.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.