0

I have a field in a table which consist of a string of values separated by semi-colons, e.g. apple; banana; orange; pear

I have been trying to build a SELECT statement to return the second group in between first and second semi-colons counting from the right (and if there is only 1 semi-colon will return null for now).

For example, orange.

I have been testing and trying ChatGPT but the closest I have gotten is of below, which returns everything right of the second semi-colon counting from the right.

As example: orange; pear

I just cannot figure out what I should do to not show anything value right of the first semi-colon on the right. Or could it be my dataset issue, as I do see sometimes commas were used instead of semi-colons but that could be another story too.

LTRIM(RTRIM(
           CASE
               -- Ensure there are more than two semicolons
               WHEN LEN(@string) - LEN(REPLACE(@string, ';', '')) >= 2
               THEN SUBSTRING(
                   @string,
                   LEN(@string) - CHARINDEX(';', REVERSE(@string), CHARINDEX(';', REVERSE(@string)) + 1) + 2,
                   CHARINDEX(';', REVERSE(@string), CHARINDEX(';', REVERSE(@string)) + 1) - 1
               )
               ELSE ''
           END
       )) AS SecondGroupFromRight

Would really appreciate some help. Many thanks in advance

3
  • 2
    Never, ever store data as separated items, it will only cause you lots of trouble. (Commas or semi-colons don't matter, same problems will appear.) Commented Feb 18 at 9:43
  • 1
    Which dbms are you using? (Product specific functions used above.) Commented Feb 18 at 9:44
  • At this stage I am reviewing the dataset in Databricks. Haven't quite decided on the final location of the hierarchy build, e.g. Tier 1 Pear, Tier 2 Orange, .. If it does not fit with the rest of the dataset, it may even be developed as a PowerBi measure. Commented Feb 19 at 23:42

4 Answers 4

0

This answer is for sql server
Another way is to split them into seperate rows, and then retrieve the 2nd last row
We can use the row_number() to give each seperate row a sequential number

declare @split nchar(1) = ';'
declare @val nvarchar(2000) = 'apple; banana; orange; pear'

select t.value
from   ( select trim(value) as value,
                ROW_NUMBER() over(order by (select 1)) as rn,
                (select count(value) from string_split(@val, @split)) as total
                from   string_split(@val, @split)
       ) t
where t.total > 2
and   t.rn = t.total - 1

See this dbFiddle

  • trim(value) to get rid of the space in front of the values
  • order by (select 1) in the row_number to avoid sorting in row_number()
  • select count(value)... to retrieve the total numbers of values
  • now you can simply say t.rn = t.total - 1 to retrieve the 2nd last row
  • where t.total > 2 so you have no result when less then 3 values
values result
declare @val nvarchar(2000) = 'apple; banana; orange; pear' orange
declare @val nvarchar(2000) = 'apple; banana; orange' banana
declare @val nvarchar(2000) = 'apple; banana'
declare @val nvarchar(2000) = 'apple'

And while this is all nice, but how to use this when @val would be a column in a table ?

well, like this

declare @split nchar(1) = ';'
  
select test.id,
       ( select t.value
         from   ( select trim(value) as value,
                         ROW_NUMBER() over(order by (select 1)) as rn,
                         (select count(value) from string_split(test.fruit, @split)) as total
                         from   string_split(test.fruit, @split)
                ) t
          where t.total > 2
          and   t.rn = t.total - 1
       ) value,
       test.othercolumn
from   test

See this dbFiddle for a working example

result could be

id value othercolumn
1 orange hello world
2 grapes how are you
Sign up to request clarification or add additional context in comments.

1 Comment

Nice one! Thank you so much for the explanation. It's helped me lots in understanding what is going on. I've managed to get charindex working in Databricks, so am happy with it for now. Will definitely keep this for future reference, or when I am stuck next =D
0

try this:

--declare @val nvarchar(2000) = 'apple; banana; orange; pear'
declare @val nvarchar(2000) = ';apple'
drop table if exists #x

select IDENTITY(INT,1,1) AS ID, * 

into #x
from string_split(reverse(@val), ';')


select nullif(reverse(value), '') as [2ndVal] from #x where id = 2 

1 Comment

Oh Wow! Simple and effective! Tried this in sql and it works a charm! Haven't taken approach into Databrick as I have gotten Charindex working in the meantime, woohoo!
0

If OP use charindex function, perhaps this is SQL Server.
It is unknown is the enable_ordinal option is available.
So, with charindex and patindex, expression is

  case when patindex('%;%;%', reverse(string))>0 then
      trim(reverse(substring(reverse(string),patindex('%;%;%', reverse(string))+1
              ,charindex(';', reverse(string),patindex('%;%;%', reverse(string))+1)
                -patindex('%;%;%', reverse(string))-1
             )))
  else ''
  end s2r
  1. Check is consist second right (reverse) part.
patindex('%;%;%', reverse(string))>0
  1. Extract part from patindex('%;%;%' ...) to next charindex(';'...)

See example

id string
1 apple; banana; orange; pear
2 apple; banana
3 apple
select *
  ,case when patindex('%;%;%', reverse(string))>0 then
      trim(reverse(substring(reverse(string),patindex('%;%;%', reverse(string))+1
              ,charindex(';', reverse(string),patindex('%;%;%', reverse(string))+1)
                -patindex('%;%;%', reverse(string))-1
             )))
   else ''
   end s2r
from test
id string s2r
1 apple; banana; orange; pear orange
2 apple; banana
3 apple

fiddle

1 Comment

Thank you ValNik. This does not return the required behavior, as for id 2 I would expect s2r be apple and for id 3 to be blank. I have forgotten to mention I am in Databricks sql too and is unable to use patindex. I was able to use Pandav's code below. Thank you very much all the same!
-1
SELECT 
    CASE 
        -- Ensure there are at least two semicolons
        WHEN LEN(@string) - LEN(REPLACE(@string, ';', '')) >= 2 
        THEN 
            -- Extract the part between the second-last and last semicolon
            SUBSTRING(
                @string,
                LEN(@string) - CHARINDEX(';', REVERSE(@string), CHARINDEX(';', REVERSE(@string)) + 1) + 2,
                CHARINDEX(';', REVERSE(@string), CHARINDEX(';', REVERSE(@string)) + 1) 
                - CHARINDEX(';', REVERSE(@string)) - 1
            )
        ELSE NULL 
    END AS SecondGroupFromRight

4 Comments

There is a code formatting option in the editor, please use it
A good answer includes an explanation
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
Thank you very much Pandav. This is so very close! As the string is not always 4 levels, sometimes there could be more and some times less. E.g we may only have pear then 2nd group from the right then will be null (in this case there will only have 1 separator ; ). Side note, the script works in Databricks sql which is awesome!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.