3

First of all, I'm trying to achieve the following: enter image description here

I first used MS SQL to figure out how I'm Able to align the tags together.

Here is the schema if you would like to recreate the unnormalised table.

CREATE TABLE unnormalized(
  vendor_tag varchar(200),
  vendor_tag_name varchar(200),
  vendor_id int
  );
  
 INSERT INTO unnormalized
VALUES
('5,8,30,24','Burgers,Desserts,Fries,Salads',1),
('5','Burgers',2),
('8,42','Desserts,Mexican',3),
('1,5,30,16','American,Burgers,Fries,Sandwiches',4),
('1,5,30,16','American,Burgers,Fries,Sandwiches',5);

Here is the code for the normalised table

SELECT
    --*
    DISTINCT CAST(tag_id AS INT) as tag_id ,tag_name
FROM unnormalized 
CROSS APPLY 
(
    (SELECT 
        value as tag_id,
        ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rn
     FROM STRING_SPLIT(vendor_tag,',') 
    ) a1
    INNER JOIN 
    (SELECT 
        value as tag_name,
        ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rn
     FROM STRING_SPLIT(vendor_tag_name,',') 
     ) a2
    ON a1.rn = a2.rn
) 
ORDER BY tag_id

Now I'm trying to rewrite this code using SQLite. However there are a few differences such as 'CROSS APPLY' and 'STRING_SPLIT' not being in SQLite. I've looked around and I found out that CROSS APPLY is maybe similar to 'CROSS JOIN' in SQLite and maybe using something like this to separate the string at the first comma it finds??

WITH split(vendor_id, vendor_tag, str) AS (
    SELECT vendor_id, '', vendor_tag||',' FROM unnormalized
    UNION ALL SELECT vendor_id,
    substr(str, 0, instr(str, ',')),
    substr(str, instr(str, ',')+1)
    FROM split 
    WHERE str
) 

SELECT vendor_id, vendor_tag
FROM split 
WHERE vendor_tag
ORDER BY vendor_id;
9
  • 4
    Fix your data model! Don't store multiple values in a single column! Don't store numbers as strings! Commented Jan 4, 2021 at 15:53
  • 1
    This table isn't just "unnormalized", it fails the most basic design rule - each cell should contain an atomic value. If you used a proper schema you wouldn't have any problems. Even in databases that have arrays, there's no relation between the elements of different arrays Commented Jan 4, 2021 at 15:57
  • As for SQLite, you gain nothing at all by splitting the values in SQL. SQLite is an embedded database, which means the engine is hosted and run by your application, using your application's RAM. It's a lot faster to split the strings in your client application's language than try to do the same in SQLite Commented Jan 4, 2021 at 16:01
  • @Panagiotis Kanavos The table is data from a csv file that I found in kaggle and was doing some exercises to populate the data into a database Commented Jan 4, 2021 at 16:06
  • Why did you use such a schema? What problem were you trying to fix? Not speed or scaleability - this schema is extremely slow and doesn't scale at all. Each query has to scan the entire table and can't use any indexes. Space? This is probably using more space than a proper table with integer IDs, even for a small number of vendors. If you have lots of vendors, you can use table compression in SQL Server Commented Jan 4, 2021 at 16:07

1 Answer 1

2

In SQLite you can do it with a recursive CTE:

WITH cte AS (
  SELECT 
    vendor_tag, 
    vendor_tag_name,
    SUBSTR(vendor_tag, 1, INSTR(vendor_tag || ',', ',') - 1) col1,
    SUBSTR(vendor_tag_name, 1, INSTR(vendor_tag_name || ',', ',') - 1) col2
  FROM unnormalized 
  UNION ALL 
  SELECT 
    SUBSTR(vendor_tag, LENGTH(col1) + 2), 
    SUBSTR(vendor_tag_name, LENGTH(col2) + 2), 
    SUBSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2), 1, INSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2) || ',', ',') - 1),
    SUBSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2), 1, INSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2) || ',', ',') - 1)
  FROM cte  
  WHERE LENGTH(vendor_tag) AND LENGTH(vendor_tag_name)
)
SELECT DISTINCT col1 vendor_tag, col2 vendor_tag_name
FROM cte
WHERE NOT (INSTR(col1, ',') OR INSTR(col2, ',')) AND (LENGTH(col1) AND LENGTH(col2))
ORDER BY vendor_tag + 0

See the demo.
Results:

> vendor_tag | vendor_tag_name
> :--------- | :--------------
> 1          | American       
> 5          | Burgers        
> 8          | Desserts       
> 16         | Sandwiches     
> 24         | Salads         
> 30         | Fries          
> 42         | Mexican    
Sign up to request clarification or add additional context in comments.

8 Comments

Wow this works, thank you!! I haven't looked a lot into recursive CTE at the moment, and I'm slightly struggling in reading the code. But i'll study the code a bit further!
SUBSTR(vendor_tag, LENGTH(col1) + 2), SUBSTR(vendor_tag_name, LENGTH(col2) + 2), SUBSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2), 1, INSTR(SUBSTR(vendor_tag, LENGTH(col1) + 2) || ',', ',') - 1), SUBSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2), 1, INSTR(SUBSTR(vendor_tag_name, LENGTH(col2) + 2) || ',', ',') - 1) Could you give a little more insight in this part of the code?
@Q.T A recursive CTE is actually a loop (something like a while loop in other programming languages). In every iteration I get the part of the string up the first , and in the next iteration I get the part of the remaining string up to the next , and so on. This is why I use string functions like INSTR(), SUBSTR().
Hi another question. How would I use that code to insert it into this table? CREATE TABLE IF NOT EXISTS tag ( tag_id INTEGER PRIMARY KEY, tag_name TEXT, );
@Q.T check this: dbfiddle.uk/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.