get character only string from another string in sql server

Question

I am looking for solution to get a character based string extracted from another string. I need only first 4 "characters only" from another string. The restriction here is that "another" string may contain spaces, special characters, numbers etc and may be less than 4 characters.

For example - I should get

"NAGP" if source string is "Nagpur District"
"ILLF" if source string is "Ill Fated"
"RAJU" if source string is "RA123 *JU23"
"MAC" if source string is "MAC"

Any help is greatly appreciated.

Thanks for sharing your time and wisdom.

Just use a solution like this to remove non-alpha characters: stackoverflow.com/questions/1007697/…, and then take the first 4 characters. — Tanner
– Tanner, Commented Oct 22, 2014 at 11:10

Community · Accepted Answer · 2017-05-23 10:26:00Z

1

You can use the answer in the question and add substring method to get your value of desired length How to strip all non-alphabetic characters from string in SQL Server?

i.e.

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

use it like

Select SUBSTRING(dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl'), 1, 4);

Here SUBSTRING is used to get string of length 4 from the returned value.

edited May 23, 2017 at 10:26

CommunityBot

11 silver badge

answered Oct 22, 2014 at 11:20

Satyajit

2,2102 gold badges20 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

IrfanRaza Over a year ago

Thanks Satyajit. Your solution worked. Thanks for your great help.

vks · Accepted Answer · 2014-10-22 11:12:16Z

1

^([a-zA-Z])[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?

You can try this.Grab the captures or groups.See demo.

http://regex101.com/r/rQ6mK9/42

answered Oct 22, 2014 at 11:12

vks

68.1k11 gold badges96 silver badges132 bronze badges

4 Comments

Lucas Trzesniewski Over a year ago

And how do you use that from T-SQL? ;)

vks Over a year ago

@LucasTrzesniewski groups are not avialable in T-SQL? :(

Lucas Trzesniewski Over a year ago

AFAIK you need to define a CLR function to get any regex support at all in T-SQL (and I don't call the LIKE patterns as regexes)...

IrfanRaza Over a year ago

Thanks vks. Your solution seems to be a simple one, but i am not able to use that with SQL server as i am not that much expert.

GarethD · Accepted Answer · 2014-10-22 12:07:29Z

A bit late to the party here, but as a general rule I despise all functions with BEGIN .. END, they almost never perform well, and since this covers all scalar functions (until Microsoft implement inline scalar expressions), as such whenever I see one I look for an alternative that offers similar reusability. In this case the query can be converted to an inline table valued function:

CREATE FUNCTION dbo.RemoveNonAlphaCharactersTVF (@String NVARCHAR(1000), @Length INT)
RETURNS TABLE
AS
RETURN
(   WITH E1 (N) AS 
    (   SELECT 1 
        FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) n (N)
    ), 
    E2 (N) AS (SELECT 1 FROM E1 CROSS JOIN E1 AS E2), 
    N (Number) AS (SELECT TOP (LEN(@String)) ROW_NUMBER() OVER(ORDER BY E1.N) FROM E2 CROSS JOIN E1)
    SELECT  Result = (  SELECT  TOP (ISNULL(@Length, 1000)) SUBSTRING(@String, n.Number, 1)
                        FROM    N
                        WHERE   SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'
                        ORDER BY Number
                        FOR XML PATH('')
                    )
);

All this does is use a list of numbers to expand the string out into columns, e.g. RA123 *JU23T becomes:

Letter
------
R
A
1
2
3

*
J
U
2
3
T

The rows that are not alphanumeric are then removed by the where clause:

WHERE   SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'

Leaving

Letter
------
R
A
J
U
T

The @Length parameter then limits the characters (in your case this would be 4), then the string is rebuilt using XML concatenation. I would usually use FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') for xml concatenation to allow for xml characters, but since I know there are none I haven't bothered as it is additional overhead.

Running some tests on this with a sample table of 1,000,000 rows:

CREATE TABLE dbo.T (String NVARCHAR(1000));
INSERT T (String)
SELECT TOP 1000000 t.String
FROM    (VALUES ('Nagpur District'), ('Ill Fated'), ('RA123 *JU23'), ('MAC')) t (String)
        CROSS JOIN sys.all_objects a
        CROSS JOIN sys.all_objects B
ORDER BY a.object_id;

Then comparing the scalar and the inline udfs (called as follows):

SELECT  COUNT(SUBSTRING(dbo.RemoveNonAlphaCharacters(t.String), 1, 4))
FROM    T;

SELECT  COUNT(tvf.Result)
FROM    T
        CROSS APPLY dbo.RemoveNonAlphaCharactersTVF (t.String, 4) AS tvf;

Over 15 test runs (probably not enough for an accurate figure, but enough to paint the picture) the average execution time for the scalar UDF was 11.824s, and for the inline TVF was 1.658, so approximately 85% faster.

Collectives™ on Stack Overflow

get character only string from another string in sql server

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related