0

I have a string that is formatted like this:

C Aleksander Barkov C Nico Hischier UTIL Tyson Jost W Taylor Hall W Evgenii Dadonov W Kyle Palmieri D Kris Letang D Ryan Suter G Casey DeSmith

I need to isolate the names between each capitalized letter (and UTIL). The logic is to extract the content between each capitalized letter including the the word UTIL.

so that the final output is for example:

'Aleksander Barkov', 
'Nico Hischier', 
'Tyson Jost', 
'Taylor Hall', 
etc

Any ideas on how to do this?

9
  • 2
    What is a 'name'? It may seem frivolous, but names are not standardized by any stretch of the imagination. Commented Oct 15, 2018 at 14:57
  • It's the content between each capitalized letter. Commented Oct 15, 2018 at 15:05
  • Can you give the example output you are expecting? Commented Oct 15, 2018 at 15:06
  • 'Aleksander Barkov', 'Nico Hischier', 'Tyson Jost', 'Taylor Hall', etc... Commented Oct 15, 2018 at 15:08
  • 1
    What if someone lists "C" as their first name? I know a number of people who go by their middle names, and will just provide a first initial on forms. For example, Charlie John Smith who goes by "John" might list his name as C John Smith Commented Oct 15, 2018 at 15:48

2 Answers 2

4

Just another option if you can't use Alan's solution via the nGram (+1)

First we perform a brute force replace creating a delimited string, then we parse this delimited string using a little XML

Example

Declare @S varchar(max) = 'C Aleksander Barkov C Nico Hischier UTIL Tyson Jost W Taylor Hall W Evgenii Dadonov W Kyle Palmieri D Kris Letang D Ryan Suter G Casey DeSmith'

Select @S = ltrim(replace(' '+@S COLLATE SQL_Latin1_General_CP1_CS_AS,C,'|'))  
 From (Select Top 26 C=' '+char(64+Row_Number() Over (Order By (Select NULL)))+' ' From master..spt_values n1  
       Union All
       Select ' UTIL '   -- Note We add "UTIL" to the list
      )  A 

Select RetSeq = Row_Number() over (Order By (Select null))
      ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From  (Select x = Cast('<x>' + replace(substring(@S,2,len(@S)),'|','</x><x>')+'</x>' as xml).query('.')) as A 
Cross Apply x.nodes('x') AS B(i)

Returns

RetSeq  RetVal
1       Aleksander Barkov
2       Nico Hischier
3       Tyson Jost
4       Taylor Hall
5       Evgenii Dadonov
6       Kyle Palmieri
7       Kris Letang
8       Ryan Suter
9       Casey DeSmith
Sign up to request clarification or add additional context in comments.

1 Comment

Great approach!
3

So grab a copy of NGrams8K. Then you can do this:

-- sample data
DECLARE @string VARCHAR(8000) = 
'C Aleksander Barkov C Nico Hischier UTIL Tyson Jost W Taylor Hall W Evgenii Dadonov W Kyle Palmieri D Kris Letang D Ryan Suter G Casey DeSmith';

-- my solution works except for cases where the upper-case word is more than one char. You'll need to iron that out
SET @string = REPLACE(@string, 'UTIL', 'U');

-- solution
SELECT SUBSTRING(@string, d.pos+2, d.nextPos-d.pos-3)
FROM 
(
  SELECT 
    pos     = ng.position,
    nextPos = LEAD(ng.position,1,8000) OVER (ORDER BY ng.position)
  FROM   samd.ngrams8K(@string, 2) AS ng
  WHERE  ng.token COLLATE Latin1_General_BIN LIKE '[A-Z] '
) AS d;

Returns:

name
------------------
Aleksander Barkov
Nico Hischier
Tyson Jost
Taylor Hall
Evgenii Dadonov
Kyle Palmieri
Kris Letang
Ryan Suter
Casey DeSmith

Note that, for cases where you have all-caps words with > 1 character you will have to update your logic.

Update (Based on question in the comments):

For a string formatted like this (note I dropped the final delimiter): |Eri|Staal|Nico Hischier|Mitchell Marner|Taylor Hall|Kyle Palmieri|Jason Zucker|Ryan Suter|Will Butcher|Keith Kinkaid

You could refactor my query to look like this:

DECLARE 
  @string VARCHAR(8000) = '|Eri|Staal|Nico Hischier|Mitchell Marner|Taylor Hall|Kyle Palmieri|Jason Zucker|Ryan Suter|Will Butcher|Keith Kinkaid',
  @delimiter CHAR(1)    = '|';

SELECT 
  sortKey = ng.position, 
  [name]  = SUBSTRING
            (
              @string,
              ng.position+1,
              LEAD(ng.position,1,8000) OVER (ORDER BY ng.position)-ng.position-1
              --ISNULL(NULLIF(CHARINDEX(@delimiter,@string,ng.position+1),0),8000)-ng.position-1
            )
FROM  samd.NGrams8K(@string, 1) AS ng
WHERE token = @delimiter;

Note that the logic above uses LEAD which requires SQL Server 2012+ if you are on 2008 your would uncomment the line below it and remove the line the uses LEAD. Note, too, that this solution is a scaled down version of DelimitedSplit8K (the 2008 solution) and DelimitedSplit8k_LEAD (the version that leverages LEAD).

All this said - If you have control over the format why not store the records in 3NF?

4 Comments

I hope no one named "Resistance Is Futile" turns up in the list. SET @string = REPLACE(@string, 'UTIL', 'U'); makes quick work of their name.
What if changed the formatting to something more uniform? Like this: |Eri|Staal|Nico Hischier|Mitchell Marner|Taylor Hall|Kyle Palmieri|Jason Zucker|Ryan Suter|Will Butcher|Keith Kinkaid|
@AlanBurstein can you elaborate on Third Normal Form and how it applies here?
@AlanBurstein I would very much like to break this long string up into 9 separate values that I could store on an entirely different table. I'm just not sure how to get there.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.