0

I have a table with formulas, and I need to be able to extract ALL of the values between brackets "[" and "]". The values I'm looking for are guaranteed to be between brackets.

Some Examples of the strings are as follows:

if ((DateTime.Parse("[ST 35401900]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3, 1, 0)

where I'm replacing "ST" with nothing.

The result should be:
35401900
35401903
35401900
35401903

Where the column name I'm searching through is called "DerivedEval" I have tried the following, but am only returning the first result.

SELECT RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(DerivedEval,CHARINDEX('[',DerivedEval)+1,CHARINDEX(']',DerivedEval)-CHARINDEX('[',DerivedEval)-1), 'ST', ''), 'INV','')))

How can I expand upon this to return all results?

1
  • 1
    This is not the kind of process that works well in sql. however there are some "split funtions" that will take a string and return a table where it spits the rows based on a delimiter. Most people use this for comma separated lists. You should use ]" and then look at the end of the results (after the space) to get your list. Commented Jan 18, 2019 at 18:25

2 Answers 2

1

Based on the sample data you posted, this is a super-simple problem if the number you are extracting is:

  1. Eight-digits long

  2. Appears at the end of the bracket

If that is the case you just need a copy of NGrams8K and you can solve this in 3 lines of code:

-- your sample data
DECLARE @string VARCHAR(8000) = 'if ((DateTime.Parse("[ST 35401900]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3, 1, 0)';

-- purely set-based solution using NGrams8K
SELECT ng.position, result = SUBSTRING(ng.token,1,8)
FROM   samd.NGrams8k(@string,9) AS ng
WHERE  CHARINDEX(']',ng.token,8) = 9;

Returns:

position   result
---------- --------
26         35401900
60         35401903
78         35401900
95         35401903

I know you don't need to know where these numbers live in the string but I included it anyhow to demonstrate how easy it is if you need to.

UPDATED ON 1/22/2019 (US) based on questions in the comments below

To handle cases where the number you are extracting is not always the same length you can use my patextract8k function (which uses NGrams8K):

CREATE FUNCTION samd.patExtract8K
(
  @string  VARCHAR(8000),
  @pattern VARCHAR(50)
)
/*****************************************************************************************
[Description]:
 This can be considered a T-SQL inline table valued function (iTVF) equivalent of 
 Microsoft's mdq.RegexExtract: except:

 1. It includes each matching substring's position in the string

 2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)
    instead of nvarchar(4000) for the pattern

 3. The mask parameter is not required and therefore does not exist.

 4. You have specify what text we're searching for as an exclusion; e.g. for numeric 
    characters you should search for '[^0-9]' instead of '[0-9]'. 

 5. There is is no parameter for naming a "capture group". Using the variable below, both 
    the following queries will return the same result:

     DECLARE @string nvarchar(4000) = N'123 Main Street';

   SELECT item FROM samd.patExtract8K(@string, '[^0-9]');
   SELECT clr.RegexExtract(@string, N'(?<number>(\d+))(?<street>(.*))', N'number', 1);

 Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:
 http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the
 rows where [matched]=0. The key benefit of is that it performs substantially better 
 because you are only returning the number of rows required instead of returning twice as
 many rows then filtering out half of them.

 The following two sets of queries return the same result:

 DECLARE @string varchar(100) = 'xx123xx555xx999';
 BEGIN
 -- QUERY #1
   -- patExtract8K
   SELECT ps.itemNumber, ps.item 
   FROM samd.patExtract8K(@string, '[^0-9]') ps;

   -- patternSplitCM   
   SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item 
   FROM dbo.patternSplitCM(@string, '[^0-9]') ps
   WHERE [matched] = 0;

 -- QUERY #2
   SELECT ps.itemNumber, ps.item 
   FROM samd.patExtract8K(@string, '[0-9]') ps;

   SELECT itemNumber = row_number() over (order by itemNumber), item 
   FROM dbo.patternSplitCM(@string, '[0-9]')
   WHERE [matched] = 0;
 END;

[Compatibility]:
 SQL Server 2008+

[Syntax]:
--===== Autonomous
 SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item
 FROM samd.patExtract8K(@string,@pattern) pe;

--===== Against a table using APPLY
 SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item
 FROM samd.SomeTable t
 CROSS APPLY samd.patExtract8K(t.someString, @pattern) pe;

[Parameters]:
 @string        = varchar(8000); the input string
 @searchString  = varchar(50); pattern to search for

[Returns]:
 itemNumber = bigint; the instance or ordinal position of the matched substring
 itemIndex  = bigint; the location of the matched substring inside the input string
 itemLength = int; the length of the matched substring
 item       = varchar(8000); the returned text

[Developer Notes]:
 1. Requires NGrams8k 

 2. patExtract8K does not return any rows on NULL or empty strings. Consider using 
    OUTER APPLY or append the function with the code below to force the function to return 
    a row on emply or NULL inputs:

    UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;

 3. patExtract8K is not case sensitive; use a case sensitive collation for 
    case-sensitive comparisons

 4. patExtract8K is deterministic. For more about deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

 5. patExtract8K performs substantially better with a parallel execution plan, often
    2-3 times faster. For queries that leverage patextract8K that are not getting a 
    parallel exeution plan you should consider performance testing using Traceflag 8649 
    in Development environments and Adam Machanic's make_parallel in production. 

[Examples]:
--===== (1) Basic extact all groups of numbers:
  WITH temp(id, txt) as
 (
   SELECT * FROM (values
   (1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),
   (2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),
   (3, '&**OOOOO=+ + + // ==?76543// and today !!222222\\\tester{}))22222444 done'))t(x,xx)
 )
 SELECT
   [temp.id] = t.id,
   pe.itemNumber,
   pe.itemIndex,
   pe.itemLength,
   pe.item
 FROM        temp AS t
 CROSS APPLY samd.patExtract8K(t.txt, '[^0-9]') AS pe;
-----------------------------------------------------------------------------------------
Revision History:
 Rev 00 - 20170801 - Initial Development - Alan Burstein
 Rev 01 - 20180619 - Complete re-write   - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),
       itemIndex  = f.position,
       itemLength = itemLen.l,
       item       = SUBSTRING(f.token, 1, itemLen.l)
FROM
(
 SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))
 FROM   samd.NGrams8k(@string, 1) AS ng
 WHERE  PATINDEX(@pattern, ng.token) <  --<< this token does NOT match the pattern
        ABS(SIGN(ng.position-1)-1) +    --<< are you the first row?  OR
        PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row
) AS f(position, token)
CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),
  DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);

Using PatExtract8K you can easily specify a range of sizes. For example, let's say you the values could be 7-9 digits long. You could do this:

-- your sample data
DECLARE @string VARCHAR(8000) = 'if ((DateTime.Parse("[ST 123456789]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3 and [ST 1234567]=x, 1, 0)';

-- Lower and upper bounds for the length of valid values 
DECLARE @low INT = 7, @high INT = 9

SELECT 
  itemIndex  = s.itemIndex, 
  itemLength = s.itemLength-1, 
  item       = SUBSTRING(s.item,0,s.itemLength)
FROM   samd.patExtract8K(REPLACE(@string,']',CHAR(1)),'[^0-9'+CHAR(1)+']') AS s
WHERE  s.itemLength BETWEEN @low AND @high+1;
--AND    SUBSTRING(s.item,0,s.itemLength) NOT LIKE '[^0-9]' <<< If required

Returns

itemIndex   itemLength  item
----------- ----------- ------------
26          9           123456789
61          8           35401903
79          8           35401900
96          8           35401903
116         7           1234567

A couple notes:

  1. I updated the sample data to include values 7-9 digits long

  2. You have to modify the code schema to dbo (vs. samd) or create a schema named samd to use this function.

Sign up to request clarification or add additional context in comments.

2 Comments

Could this be elaborated upon to support values that are 7 digits long? Appears our numbering schema isn't consistent. An Additional Example: if ((DateTime.Parse("[ST 401900]") < DateTime.Parse("[ST 401903]")) and [401900]=0 and [401903]=3, 1, 0) where the result of the token returns: T 401900 any ways to improve upon your answer to account for this new information?
@ChrisFischer see my updated answer. Let me know if you have any questions.
0

Based on Hogan's response, I decided to roll a few functions to accomplish this.

CREATE FUNCTION [dbo].[GetDerivedDataPointsFromFormula]
(   
@DerivedDataPointId INT
,@strFormula VARCHAR(MAX)
)
RETURNS @RtnValue table
(
id int identity(1,1)
,DerivedDataPointId INT
,DataPointId INT
,Formula VARCHAR(MAX)
)
AS
BEGIN

INSERT INTO @RtnValue(DerivedDataPointId, DataPointId, Formula)
SELECT @DerivedDataPointId
, RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(data, 0, CHARINDEX(']', data, 0)),'ST',''), 'INV', ''))) AS DataPointIdInvolved
, @strFormula
FROM (
    SELECT DATA
    FROM dbo.split(@strFormula, '[')
) AS data
WHERE LEN(RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(data, 0, CHARINDEX(']', data, 0)),'ST',''), 'INV', '')))) > 0

RETURN

END

Split Function defined as:

CREATE FUNCTION [dbo].[Split]
(
@RowData varchar(MAX),
@SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
Id int identity(1,1),
Data nvarchar(1000)
) 
AS  
BEGIN 
Declare @Cnt int
Set @Cnt = 1

While (Charindex(@SplitOn,@RowData)>0)
Begin
    Insert Into @RtnValue (data)
    Select 
        Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))

    Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
    Set @Cnt = @Cnt + 1
End

Insert Into @RtnValue (data)
Select Data = ltrim(rtrim(@RowData))

Return
END

Then I was able to get what I needed when using a Cross Apply

Select DISTINCT f.DerivedDataPointId
, f.DataPointId
,DerivedEval
from DerivedDataPoint d (readuncommitted)
Cross Apply dbo.GetDerivedDataPointsFromFormula(d.DerivedDataPointId, d.DerivedEval) f

Maybe this will help someone else looking for a similar approach for anything.

1 Comment

Multi-statement Table valued (mTVF) functions perform awfully; the only thing slower is two of them. Loops slow things down too. If the OP is using SQL 2016+ they could use STRING_SPLIT which is very fast, for pre-2016 they could use DelimitedSplit8K or DelimitedSplit8K_lead. Learning set-baeed code and you can solve stuff like this with 3 lines of code;note my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.