0

I have a question about SQL Server: I have a database column with a pattern which is like this:

  1. up to 10 digits
  2. then a comma
  3. up to 10 digits
  4. then a semicolon

e.g.

100000161, 100000031; 100000243, 100000021;
100000161, 100000031; 100000243, 100000021;

and I want to extract within the pattern the first digits (up to 10) (1.) and then a semicolon (4.)

(or, in other words, remove everything from the semicolon to the next semicolon)

100000161; 100000243; 100000161; 100000243;

Can you please advice me how to establish this in SQL Server? Im not very familiar with regex and therefore have no clue how to fix this.

Thanks,

Alex

15
  • 1
    SQL Server is notorious among the enterprise databases for having fairly lousy regex replace support, which is probably what you would want to be using for this problem. Is there any chance you could scrub this data somewhere else? Commented Sep 1, 2017 at 12:08
  • @TimBiegeleisen No matter how lousy the regex support is, something this simple will never be a problem in any regex engine. Regex is also something you would definitely NOT want to use for this task. Commented Sep 1, 2017 at 12:10
  • @Tomalak SUBSTRING_INDEX is not a SQL Server function, it's a MySQL function, and yes, regex is the sort of thing you would want to use here. Commented Sep 1, 2017 at 12:11
  • 1
    @user3898488 are you trying to return the first field from each pair. You can use STRING_SPLIT to split first by ;, then by ,. It would be better if you parsed the data before storing it in the database though. You can't take advantage of indexes if you need to apply functions on a column's values Commented Sep 1, 2017 at 12:21
  • 1
    The sample data you show does not match the pattern that you wrote. Commented Sep 1, 2017 at 12:22

3 Answers 3

1

Try this

Declare @Sql Table (SqlCol nvarchar(max))
INSERT INTO @Sql
SELECT'100000161,100000031;100000243,100000021;100000161,100000031;100000243,100000021;'
   ;WITH cte 
     AS (SELECT Row_number() 
                  OVER( 
                    ORDER BY (SELECT NULL))         AS Rno, 
                split.a.value('.', 'VARCHAR(1000)') AS Data 
         FROM   (SELECT Cast('<S>' 
                             + Replace( Replace(sqlcol, ';', ','), ',', 
                             '</S><S>') 
                             + '</S>'AS XML) AS Data 
                 FROM   @Sql)AS A 
                CROSS apply data.nodes('/S') AS Split(a)) 
SELECT Stuff((SELECT '; ' + data 
              FROM   cte 
              WHERE  rno%2 <> 0 
                     AND data <> '' 
              FOR xml path ('')), 1, 2, '') AS ExpectedData 

ExpectedData
-------------
100000161; 100000243; 100000161; 100000243
Sign up to request clarification or add additional context in comments.

3 Comments

You don't need all this to extract the first value. Use different inner and outer tags instead of a single <S> and select the one you want
This looks good. The only issue I realize here when I checked on real data was that the outbut of select column from table is written into one line while the source data is in different rows?
@user3898488 what do you want? All results in a single row? A pair per input row?
1

I believe this will get you what you are after as long as that pattern truly holds. If not it's fairly easy to ensure it does conform to that pattern and then apply this

Select Substring(TargetCol, 1, 10) + ';' From TargetTable

3 Comments

OP changed the spec a little, so it would now be SELECT LEFT(TargetCol, CHARINDEX(',', TargetCol) - 1) + ';' WHERE CHARINDEX(',', TargetCol) BETWEEN 1 AND 11;.
This looks pretty good, but where do i have add the from targettable? I checked your first command and it runs fine, but im failing in merging your two commands
@user3898488 Oops! I tested with a variable and changed the name without adding in the FROM, so... SELECT LEFT(TargetCol, CHARINDEX(',', TargetCol) - 1) + ';' FROM SomeTable WHERE CHARINDEX(',', TargetCol) BETWEEN 1 AND 11;. But that won't help if you have more than one pair of data in a row.
0

You can take advantage of SQL Server's XML support to convert the input string into an XML value and query it with XQuery and XPath expressions.

For example, the following query will replace each ; with </b><a> and each , to </a><b> to turn each string into <a>100000161</a><a>100000243</a><a />. After that, you can select individual <a> nodes with /a[1], /a[2] :

declare @table table (it nvarchar(200))

insert into @table values
('100000161, 100000031; 100000243, 100000021;'),
('100000161, 100000031; 100000243, 100000021;')

select 
    xCol.value('/a[1]','nvarchar(200)'), 
    xCol.value('/a[2]','nvarchar(200)')
from (
    select convert(xml, '<a>' 
                        + replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
                        + '</a>')
                  .query('a') as xCol
    from @table) as tmp 

-------------------------
A1          A2
100000161   100000243
100000161   100000243

value extracts a single value from an XML field. nodes returns a table of nodes that match the XPath expression. The following query will return all "keys" :

select 
    a.value('.','nvarchar(200)')
from (
    select convert(xml, '<a>' 
                        + replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
                        + '</a>')
                  .query('a') as xCol
    from @table) as tmp 
    cross apply xCol.nodes('a') as y(a)
where a.value('.','nvarchar(200)')<>''

------------
100000161
100000243
100000161
100000243

With 200K rows of data though, I'd seriously consider transforming the data when loading it and storing it in indivisual, indexable columns, or add a separate, related table. Applying string manipulation functions on a column means that the server can't use any covering indexes to speed up queries.

If that's not possible (why?) I'd consider at least adding a separate XML-typed column that would contain the same data in XML form, to allow the creation of an XML index.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.