PostgreSQL- splitting rows

Question

I have got a table that looks like this:

ID      |  name  | details
---------------------------
1.3.1-3 | Jack   | a
5.4.1-2 | John   | b
1.4.5   | Alex   | c

And what to split it like this:

ID      |  name  | details
---------------------------
1.3.1   | Jack   | a
1.3.2   | Jack   | a
1.3.3   | Jack   | a
5.4.1   | John   | b
5.4.2   | John   | b
1.4.5   | Alex   | c

How can I do it in postgresql?

Is the index number always X.X.A-B or could be X.X.X.X.A-B?

Mateusz
– Mateusz

2016-02-23 10:53:29 +00:00
Commented Feb 23, 2016 at 10:53 — Mateusz
– Mateusz, Commented Feb 23, 2016 at 10:53
It can be both. I want to know how to split it correctly

axeMaltesse
– axeMaltesse

2016-02-23 10:54:40 +00:00
Commented Feb 23, 2016 at 10:54 — axeMaltesse
– axeMaltesse, Commented Feb 23, 2016 at 10:54

joop · Accepted Answer · 2016-02-23 12:02:56Z

2

CREATE TABLE tosplit
        ( id text NOT NULL
        , name text
        , details text
        );

INSERT INTO tosplit( id , name , details ) VALUES
 ( '1.3.1-3' , 'Jack' , 'a' )
,( '5.4.1-2' , 'John' , 'b' )
,( '1.4.5' , 'Alex' , 'c' )


WITH zzz AS (
        SELECT id
        , regexp_replace(id, '([0-9\.]+\.)([0-9]+)-([0-9]+)', e'\\1', e'g') AS one
        , regexp_replace(id, '([0-9\.]+\.)([0-9]+)-([0-9]+)', e'\\2', e'g') AS two
        , regexp_replace(id, '([0-9\.]+\.)([0-9]+)-([0-9]+)', e'\\3', e'g') AS three
        , name
        , details
        FROM tosplit
        )
    SELECT z1.id
        -- , z1.one
        , z1.one || generate_series( z1.two::integer, z1.three::integer)::text AS four
        , z1.name, z1.details
FROM zzz z1
WHERE z1.two <> z1.one
UNION ALL
SELECT z0.id
        -- , z0.one
        , z0.one AS four
        , z0.name, z0.details
FROM zzz z0
WHERE z0.two = z0.one
        ;

Result:

CREATE TABLE
INSERT 0 3
   id    | four  | name | details 
---------+-------+------+---------
 1.3.1-3 | 1.3.1 | Jack | a
 1.3.1-3 | 1.3.2 | Jack | a
 1.3.1-3 | 1.3.3 | Jack | a
 5.4.1-2 | 5.4.1 | John | b
 5.4.1-2 | 5.4.2 | John | b
 1.4.5   | 1.4.5 | Alex | c

edited Feb 23, 2016 at 12:02

answered Feb 23, 2016 at 11:35

joop

4,5431 gold badge18 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

axeMaltesse Over a year ago

Thank you for the answer. I will implement it and check if it works correctly.

joop Over a year ago

It even works for the Zaphod / 11.12.13.14.15.16.2-7 case!

axeMaltesse Over a year ago

If I would like to update previous table I need to add UPDATE statement before 2nd SELECT? . . FROM tosplit) UPDATE SET etc... SELECT z1.id ...

joop Over a year ago

You cannot update, because the number of rows increases. Better create a new table with CREATE TABLE AS ... and rename/use that.

axeMaltesse Over a year ago

Yep, I just figured out that UPDATE statement is not working. Thank you for your next hint :)!

|

Lukasz Szozda · Accepted Answer · 2016-02-23 11:25:48Z

1

You could split id based on - and . and concatenate with generated series:

CREATE TABLE tab(
   ID      VARCHAR(18) NOT NULL PRIMARY KEY
  ,name    VARCHAR(8) NOT NULL
  ,details VARCHAR(11) NOT NULL
);
INSERT INTO tab(ID,name,details) VALUES ('1.3.1-3','Jack','a');
INSERT INTO tab(ID,name,details) VALUES ('5.4.1-2','John','b');
INSERT INTO tab(ID,name,details) VALUES ('1.4.5','Alex','c');
INSERT INTO tab(ID,name,details) VALUES ('1.7.11-13','Joe','d');
INSERT INTO tab(ID,name,details) VALUES ('1.7-13','Smith','e');

Main query:

;WITH cte AS
(
  SELECT *, 
    split_part(id, '-', 1) AS prefix,
    split_part(reverse(split_part(id, '-', 1)),'.',1)::int AS start,
    CASE WHEN split_part(id, '-',2) <> '' 
         THEN split_part(id, '-', 2):: int 
         ELSE NULL 
    END AS stop
  FROM tab
)
SELECT 
  LEFT(prefix, LENGTH(prefix) - strpos(reverse(prefix), '.')) || '.' || n::text AS id,
  name,
  details     
FROM cte
CROSS JOIN LATERAL generate_series(start,COALESCE(stop, start)) AS sub(n);

SqlFiddleDemo

Output:

╔═════════╦════════╦═════════╗
║   id    ║ name   ║ details ║
╠═════════╬════════╬═════════╣
║ 1.3.1   ║ Jack   ║ a       ║
║ 1.3.2   ║ Jack   ║ a       ║
║ 1.3.3   ║ Jack   ║ a       ║
║ 5.4.1   ║ John   ║ b       ║
║ 5.4.2   ║ John   ║ b       ║
║ 1.4.5   ║ Alex   ║ c       ║
║ 1.7.11  ║ Joe    ║ d       ║
║ 1.7.12  ║ Joe    ║ d       ║
║ 1.7.13  ║ Joe    ║ d       ║
║ 1.7     ║ Smith  ║ e       ║
║ 1.8     ║ Smith  ║ e       ║
║ 1.9     ║ Smith  ║ e       ║
║ 1.10    ║ Smith  ║ e       ║
║ 1.11    ║ Smith  ║ e       ║
║ 1.12    ║ Smith  ║ e       ║
║ 1.13    ║ Smith  ║ e       ║
╚═════════╩════════╩═════════╝

edited Feb 23, 2016 at 11:25

answered Feb 23, 2016 at 11:13

Lukasz Szozda

181k26 gold badges278 silver badges326 bronze badges

4 Comments

user330315 Over a year ago

Postgres needs the statement termination character ; at the end of the statement, not at the beginning

Lukasz Szozda Over a year ago

@a_horse_with_no_name Thanks for comment, I know that. I add ; before WITH/MERGE because I use SQL Server daily. In SQL Server there is no need for ; after every statement so to avoid problems I add them.

user330315 Over a year ago

;with looks ridiculous to me ;)

axeMaltesse Over a year ago

I have implemented your code and error appear "syntax error at or near "("". There is a communicate that in last part where CROSS JOIN is, it unable to resolve column start & stop.

score 1 · Accepted Answer · 2016-02-23 11:53:20Z

with elements as (
  select id, 
         regexp_split_to_array(id, '(\.)') as id_elements,
         name, 
         details
  from the_table
), bounds as (
  select id, 
         case 
           when strpos(id, '-') = 0 then 1
           else split_part(id_elements[cardinality(id_elements)], '-', 1)::int
         end as start_value,
         case 
           when strpos(id, '-') = 0 then 1
           else split_part(id_elements[cardinality(id_elements)], '-', 2)::int
         end as end_value,
         case 
           when strpos(id, '-') = 0 then id
           else array_to_string(id_elements[1:cardinality(id_elements)-1], '.')
         end as base_id,
         name, 
         details
  from elements
)
select b.base_id||'.'||c.cnt as new_id, 
       b.name,
       b.details, 
       count(*) over (partition by b.base_id) as num_rows
from bounds b 
  cross join lateral generate_series(b.start_value, b.end_value) as c (cnt)
order by num_rows desc, c.cnt;

The first CTE simply splits the ID based on the .. The second CTE then calculates the start and end value for each ID and "strips" the range definition from the actual ID value to get the base that can be concatenated with the actual row index in the final select statement.

With this test data:

insert into the_table
values
('1.3.1-3',              'Jack',  'details 1'),
('5.4.1-2',              'John',  'details 2'),
('1.4.5',                'Alex',  'details 3'),
('10.11.12.1-5',         'Peter', 'details 4'),
('1.4.10-13',            'Arthur','details 5'),
('11.12.13.14.15.16.2-7','Zaphod','details 6');

The following result is returned:

new_id              | name   | details   | num_rows
--------------------+--------+-----------+---------
11.12.13.14.15.16.2 | Zaphod | details 6 |        6
11.12.13.14.15.16.3 | Zaphod | details 6 |        6
11.12.13.14.15.16.4 | Zaphod | details 6 |        6
11.12.13.14.15.16.5 | Zaphod | details 6 |        6
11.12.13.14.15.16.6 | Zaphod | details 6 |        6
11.12.13.14.15.16.7 | Zaphod | details 6 |        6
10.11.12.1          | Peter  | details 4 |        5
10.11.12.2          | Peter  | details 4 |        5
10.11.12.3          | Peter  | details 4 |        5
10.11.12.4          | Peter  | details 4 |        5
10.11.12.5          | Peter  | details 4 |        5
1.4.10              | Arthur | details 5 |        4
1.4.11              | Arthur | details 5 |        4
1.4.12              | Arthur | details 5 |        4
1.4.13              | Arthur | details 5 |        4
1.3.1               | Jack   | details 1 |        3
1.3.2               | Jack   | details 1 |        3
1.3.3               | Jack   | details 1 |        3
5.4.1               | John   | details 2 |        2
5.4.2               | John   | details 2 |        2
1.4.5.1             | Alex   | details 3 |        1

The use of cardinality(id_elements) requires Postgres 9.4. For earlier versions this needs to be replaced with array_length(id_elements, 1))

A final note:

This would be a lot easier if you stored the start and end value in separate (integer) columns, rather then appending them to the ID itself. This model violates basic database normalization (first normal form).

This solution (or any solution in the answers given) will fail badly if an is stored that contains e.g. 10.12.13.A-Z (non-numeric values) which can be prevented by properly normalizing the data.

Still won't work as it should. Please check '1.3.10-13' :)

Giorgos Betsos · Accepted Answer · 2016-02-23 12:17:52Z

0

You can use the following query:

SELECT CASE 
          WHEN num = 0 THEN "ID"
          ELSE CONCAT(LEFT("ID", 
                           LENGTH("ID") + 1 - 
                           position('.' IN REVERSE("ID"))),
                      num)
       END,
       "name", 
       "details"      
FROM (              
  SELECT split_part("ID", '-', 1) AS "ID", 
         "name", "details",
         generate_series(
           CASE 
             WHEN position('-' in "ID") = 0 THEN 0
             ELSE 1
           END, 
           CASE 
             WHEN position('-' in "ID") = 0 THEN 0
             ELSE CAST(split_part("ID", '-', 2)  AS INT)
           END) AS num  
  FROM mytable) AS t

Demo here

edited Feb 23, 2016 at 12:17

answered Feb 23, 2016 at 11:16

Giorgos Betsos

72.3k10 gold badges69 silver badges103 bronze badges

5 Comments

user330315 Over a year ago

That will not repeat the rows

Giorgos Betsos Over a year ago

@a_horse_with_no_name Did you check the demo?

Lukasz Szozda Over a year ago

Why do you repeat rows? OP wants the last part increasing so not 3 times 1.3.1 but 1.3.1, 1.3.2 and 1.3.3

Giorgos Betsos Over a year ago

@lad2025 I think this is the OPs intention: he wants to repeat each row by the number after the '-'.

Giorgos Betsos Over a year ago

@lad2025 Ok I see now what you mean.

Collectives™ on Stack Overflow

PostgreSQL- splitting rows

4 Answers 4

6 Comments

4 Comments

1 Comment

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

4 Comments

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related