MS SQL to SQLite syntax

Question

First of all, I'm trying to achieve the following:

I first used MS SQL to figure out how I'm Able to align the tags together.

Here is the schema if you would like to recreate the unnormalised table.

CREATE TABLE unnormalized(
  vendor_tag varchar(200),
  vendor_tag_name varchar(200),
  vendor_id int
  );
  
 INSERT INTO unnormalized
VALUES
('5,8,30,24','Burgers,Desserts,Fries,Salads',1),
('5','Burgers',2),
('8,42','Desserts,Mexican',3),
('1,5,30,16','American,Burgers,Fries,Sandwiches',4),
('1,5,30,16','American,Burgers,Fries,Sandwiches',5);

Here is the code for the normalised table

SELECT
    --*
    DISTINCT CAST(tag_id AS INT) as tag_id ,tag_name
FROM unnormalized 
CROSS APPLY 
(
    (SELECT 
        value as tag_id,
        ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rn
     FROM STRING_SPLIT(vendor_tag,',') 
    ) a1
    INNER JOIN 
    (SELECT 
        value as tag_name,
        ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rn
     FROM STRING_SPLIT(vendor_tag_name,',') 
     ) a2
    ON a1.rn = a2.rn
) 
ORDER BY tag_id

Now I'm trying to rewrite this code using SQLite. However there are a few differences such as 'CROSS APPLY' and 'STRING_SPLIT' not being in SQLite. I've looked around and I found out that CROSS APPLY is maybe similar to 'CROSS JOIN' in SQLite and maybe using something like this to separate the string at the first comma it finds??

WITH split(vendor_id, vendor_tag, str) AS (
    SELECT vendor_id, '', vendor_tag||',' FROM unnormalized
    UNION ALL SELECT vendor_id,
    substr(str, 0, instr(str, ',')),
    substr(str, instr(str, ',')+1)
    FROM split 
    WHERE str
) 

SELECT vendor_id, vendor_tag
FROM split 
WHERE vendor_tag
ORDER BY vendor_id;

Fix your data model! Don't store multiple values in a single column! Don't store numbers as strings! — Gordon Linoff
– Gordon Linoff, Commented Jan 4, 2021 at 15:53
This table isn't just "unnormalized", it fails the most basic design rule - each cell should contain an atomic value. If you used a proper schema you wouldn't have any problems. Even in databases that have arrays, there's no relation between the elements of different arrays — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Jan 4, 2021 at 15:57
As for SQLite, you gain nothing at all by splitting the values in SQL. SQLite is an embedded database, which means the engine is hosted and run by your application, using your application's RAM. It's a lot faster to split the strings in your client application's language than try to do the same in SQLite — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Jan 4, 2021 at 16:01
@Panagiotis Kanavos The table is data from a csv file that I found in kaggle and was doing some exercises to populate the data into a database — Q.T
– Q.T, Commented Jan 4, 2021 at 16:06
Why did you use such a schema? What problem were you trying to fix? Not speed or scaleability - this schema is extremely slow and doesn't scale at all. Each query has to scan the entire table and can't use any indexes. Space? This is probably using more space than a proper table with integer IDs, even for a small number of vendors. If you have lots of vendors, you can use table compression in SQL Server — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Jan 4, 2021 at 16:07

forpas · Accepted Answer · 2021-01-04 16:42:07Z

2

In SQLite you can do it with a recursive CTE:

WITH cte AS (
  SELECT 
    vendor_tag, 
    vendor_tag_name,
    SUBSTR(vendor_tag, 1, INSTR(vendor_tag || ',', ',') - 1) col1,
    SUBSTR(vendor_tag_name, 1, INSTR(vendor_tag_name || ',', ',') - 1) col2
  FROM unnormalized 
  UNION ALL 
  SELECT 
    SUBSTR(vendor_tag, LENGTH(col1) + 2), 
    SUBSTR(vendor_tag_name, LENGTH(col2) + 2), 
    SUBSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2), 1, INSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2) || ',', ',') - 1),
    SUBSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2), 1, INSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2) || ',', ',') - 1)
  FROM cte  
  WHERE LENGTH(vendor_tag) AND LENGTH(vendor_tag_name)
)
SELECT DISTINCT col1 vendor_tag, col2 vendor_tag_name
FROM cte
WHERE NOT (INSTR(col1, ',') OR INSTR(col2, ',')) AND (LENGTH(col1) AND LENGTH(col2))
ORDER BY vendor_tag + 0

See the demo.
Results:

> vendor_tag | vendor_tag_name
> :--------- | :--------------
> 1          | American       
> 5          | Burgers        
> 8          | Desserts       
> 16         | Sandwiches     
> 24         | Salads         
> 30         | Fries          
> 42         | Mexican

answered Jan 4, 2021 at 16:42

forpas

165k10 gold badges51 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Q.T Over a year ago

Wow this works, thank you!! I haven't looked a lot into recursive CTE at the moment, and I'm slightly struggling in reading the code. But i'll study the code a bit further!

Q.T Over a year ago

SUBSTR(vendor_tag, LENGTH(col1) + 2),      SUBSTR(vendor_tag_name, LENGTH(col2) + 2),      SUBSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2), 1, INSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2) || ',', ',') - 1),     SUBSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2), 1, INSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2) || ',', ',') - 1)

Could you give a little more insight in this part of the code?

forpas Over a year ago

@Q.T A recursive CTE is actually a loop (something like a while loop in other programming languages). In every iteration I get the part of the string up the first , and in the next iteration I get the part of the remaining string up to the next , and so on. This is why I use string functions like INSTR(), SUBSTR().

Q.T Over a year ago

Hi another question. How would I use that code to insert it into this table? CREATE TABLE IF NOT EXISTS tag ( tag_id INTEGER PRIMARY KEY, tag_name TEXT, );

forpas Over a year ago

@Q.T check this: dbfiddle.uk/…

|

Collectives™ on Stack Overflow

MS SQL to SQLite syntax

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related