0

I have a text Column with data as below

RawDataColumn
THANK 1000 1500 1740 1  YOU 1000 1740 1820 1  ABC 1000 1820 1960 1  XYZABC 1000 1960 2240 1  DFGS 1000 2240 2380 1  THINK 1000 2380 2480 1

I want to parse the Text column to multple columns as below

Word   A     B    C   D
THANK 1000 1500 1740 1 
YOU   1000 1740 1820 1
ABC   1000 1820 1960 1
XYZA  1000 1960 2240 1
DFGS 1000 2240 2380 1
THINK 1000 2380 2480 1

SQL Server Version : SQL Server 2016

5
  • SQL Server is not the best tool to be using for this task. Commented Apr 8, 2020 at 10:18
  • what will be the best way to do it with better performance. @TimBiegeleisen Commented Apr 8, 2020 at 10:41
  • Try using a scripting language such as Python or Perl. Then, re-import the data when you already have separate well defined rows. Commented Apr 8, 2020 at 10:43
  • If [word] is always alpha and A-D always nuneric, you can create a udf with SUBSTRING and PATINDEX Commented Apr 8, 2020 at 11:48
  • @jigga - Yes, word is always Alpha and A-D columns are numeric. I couldn't get the logic how to handle it using SUBSTRING and PATINDEX. Commented Apr 8, 2020 at 12:07

3 Answers 3

1

SQL Server is not the best place to handle such text scrubbing requirements. I will give a Python script which can generate a text file with clearly defined lines:

inp = "THANK 1000 1500 1740 1  YOU 1000 1740 1820 1  ABC 1000 1820 1960 1  XYZABC 1000 1960 2240 1  DFGS 1000 2240 2380 1  THINK 1000 2380 2480 1"
lines = re.findall(r'\S+ \d+ \d+ \d+ \d+', inp)
f = open('output.txt', 'w')
for line in lines:
    f.write(line + '\n')
f.close()

Now the output file output.txt should have proper lines of data, separated by space for each column. You may try a similar approach with really any other language, and then import into SQL Server.

Sign up to request clarification or add additional context in comments.

1 Comment

It tried using PowerShell creating files and importing them back to SQL but when I deal with 120,000 files(each column data as 1 file) , it is effecting the performance( takes 15+ hrs to import).
1

In regards to my comment, this is one way to do this (not my best work :D )

CREATE FUNCTION dbo.Split
(
    @string nvarchar(max)
)
RETURNS @result TABLE (Word nvarchar(max), A int, B int, C int, D int)
AS
 BEGIN

    DECLARE @sub nvarchar(max)
    DECLARE @Word nvarchar(max)
    DECLARE @A int
    DECLARE @B int
    DECLARE @C int
    DECLARE @D int

    IF @string IS NULL 
     BEGIN
        INSERT INTO @result VALUES(NULL, NULL, NULL, NULL, NULL)
     END

    ELSE
     BEGIN
        WHILE LEN(@string) > 0
         BEGIN
            IF @string LIKE '% [A-Z]%'
             BEGIN
                SET @sub = SUBSTRING(@string, 0, PATINDEX('% [A-Z]%',  @string))
             END
            ELSE
             BEGIN
                SET @sub = @string
             END

            SET @string = LTRIM(RTRIM(RIGHT(@string, LEN(@string) - LEN(@sub))))
            SET @Word = LEFT(@sub, CHARINDEX(' ', @sub) - 1)

            SET @sub = SUBSTRING(@sub, CHARINDEX(' ', @sub) + 1, LEN(@sub))
            SET @A = LEFT(@sub, CHARINDEX(' ', @sub))

            SET @sub = SUBSTRING(@sub, CHARINDEX(' ', @sub) + 1, LEN(@sub))
            SET @B = LEFT(@sub, CHARINDEX(' ', @sub))

            SET @sub = SUBSTRING(@sub, CHARINDEX(' ', @sub) + 1, LEN(@sub))
            SET @C = LEFT(@sub, CHARINDEX(' ', @sub))

            SET @D = SUBSTRING(@sub, CHARINDEX(' ', @sub) + 1, LEN(@sub))

            INSERT INTO @result VALUES(@Word, @A, @B, @C, @D)
         END
     END
    RETURN  
 END

Comments

0
create table test (RawDataColumn varchar(2000))
insert into test values('THANK 1000 1500 1740 1  YOU 1000 1740 1820 1  ABC 1000 1820 1960 1  XYZABC 1000 1960 2240 1  DFGS 1000 2240 2380 1  THINK 1000 2380 2480 1')
;with mycte as (

Select value as val1 from test
Cross apply String_split( replace(RawDataColumn,'  ','|'),'|')
)




Select    Max(Case when rn=1 then value end) word
, Max(Case when rn=2 then value end) A
, Max(Case when rn=3 then value end) B
, Max(Case when rn=4 then value end) C
, Max(Case when rn=5 then value end) D
from mycte  s
Cross apply (
SELECT ss.[value], ROW_NUMBER() OVER (PARTITION BY s.val1 ORDER BY s.val1 ) AS rn
FROM string_Split(val1,' ') AS ss
) as d
Group by s.val1


drop table test

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.