0

I am looking for solution to get a character based string extracted from another string. I need only first 4 "characters only" from another string. The restriction here is that "another" string may contain spaces, special characters, numbers etc and may be less than 4 characters.

For example - I should get

  1. "NAGP" if source string is "Nagpur District"

  2. "ILLF" if source string is "Ill Fated"

  3. "RAJU" if source string is "RA123 *JU23"

  4. "MAC" if source string is "MAC"

Any help is greatly appreciated.

Thanks for sharing your time and wisdom.

2
  • Have you attempted to solve this yourself in any way yet? Commented Oct 22, 2014 at 11:07
  • Just use a solution like this to remove non-alpha characters: stackoverflow.com/questions/1007697/…, and then take the first 4 characters. Commented Oct 22, 2014 at 11:10

3 Answers 3

1

You can use the answer in the question and add substring method to get your value of desired length How to strip all non-alphabetic characters from string in SQL Server?

i.e.

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

use it like

Select SUBSTRING(dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl'), 1, 4);

Here SUBSTRING is used to get string of length 4 from the returned value.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Satyajit. Your solution worked. Thanks for your great help.
1
^([a-zA-Z])[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?

You can try this.Grab the captures or groups.See demo.

http://regex101.com/r/rQ6mK9/42

4 Comments

And how do you use that from T-SQL? ;)
@LucasTrzesniewski groups are not avialable in T-SQL? :(
AFAIK you need to define a CLR function to get any regex support at all in T-SQL (and I don't call the LIKE patterns as regexes)...
Thanks vks. Your solution seems to be a simple one, but i am not able to use that with SQL server as i am not that much expert.
0

A bit late to the party here, but as a general rule I despise all functions with BEGIN .. END, they almost never perform well, and since this covers all scalar functions (until Microsoft implement inline scalar expressions), as such whenever I see one I look for an alternative that offers similar reusability. In this case the query can be converted to an inline table valued function:

CREATE FUNCTION dbo.RemoveNonAlphaCharactersTVF (@String NVARCHAR(1000), @Length INT)
RETURNS TABLE
AS
RETURN
(   WITH E1 (N) AS 
    (   SELECT 1 
        FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) n (N)
    ), 
    E2 (N) AS (SELECT 1 FROM E1 CROSS JOIN E1 AS E2), 
    N (Number) AS (SELECT TOP (LEN(@String)) ROW_NUMBER() OVER(ORDER BY E1.N) FROM E2 CROSS JOIN E1)
    SELECT  Result = (  SELECT  TOP (ISNULL(@Length, 1000)) SUBSTRING(@String, n.Number, 1)
                        FROM    N
                        WHERE   SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'
                        ORDER BY Number
                        FOR XML PATH('')
                    )
);

All this does is use a list of numbers to expand the string out into columns, e.g. RA123 *JU23T becomes:

Letter
------
R
A
1
2
3

*
J
U
2
3
T

The rows that are not alphanumeric are then removed by the where clause:

WHERE   SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'

Leaving

Letter
------
R
A
J
U
T

The @Length parameter then limits the characters (in your case this would be 4), then the string is rebuilt using XML concatenation. I would usually use FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') for xml concatenation to allow for xml characters, but since I know there are none I haven't bothered as it is additional overhead.

Running some tests on this with a sample table of 1,000,000 rows:

CREATE TABLE dbo.T (String NVARCHAR(1000));
INSERT T (String)
SELECT TOP 1000000 t.String
FROM    (VALUES ('Nagpur District'), ('Ill Fated'), ('RA123 *JU23'), ('MAC')) t (String)
        CROSS JOIN sys.all_objects a
        CROSS JOIN sys.all_objects B
ORDER BY a.object_id;

Then comparing the scalar and the inline udfs (called as follows):

SELECT  COUNT(SUBSTRING(dbo.RemoveNonAlphaCharacters(t.String), 1, 4))
FROM    T;

SELECT  COUNT(tvf.Result)
FROM    T
        CROSS APPLY dbo.RemoveNonAlphaCharactersTVF (t.String, 4) AS tvf;

Over 15 test runs (probably not enough for an accurate figure, but enough to paint the picture) the average execution time for the scalar UDF was 11.824s, and for the inline TVF was 1.658, so approximately 85% faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.