3

I've that kind of string

Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z

My required result it's

 Column1  Column 2     Column3
 Test 1   new york        X
 Test 2   chicago         Y
 Test 3   harrisburg,pa   Z

But running this query

SELECT  
split_part(stat.st, '|', 1) Column1,
split_part(stat.st, '|', 2) Column2,    
split_part(stat.st, '|', 3) Column3
FROM
(
    SELECT
            UNNEST (
                string_to_array('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z',',')
            )
         AS st
) stat;

Result is

 Column1  Column 2   Column3
 Test 1   new york      X
 Test 2   chicago       Y
 Test 3   harrisburg    
 pa          Z  

Column3 could be everything (except | ). Possible pattern to match it's .This could be repeated N times. STRING could be everything except | char.

How could I use regexp_split_to_array() to have my desire result set?

3
  • 1
    You are not giving enough information. Is column3 always a single upper case character? What else can you tell uas? Commented Mar 27, 2013 at 16:33
  • 1
    No Column3 could be everything (except | ). Possible pattern to match it's <STRING|STRING|STRING,>.This could be repeated N times. STRING could be everything except | char. Commented Mar 27, 2013 at 16:36
  • 1
    This kind of information has to be in the question. Please edit. Commented Mar 27, 2013 at 17:28

1 Answer 1

4

There is barely enough information to make this work. But this does the job:

SELECT * FROM crosstab3(
   $$
   SELECT (rn/3)::text AS x, (rn%3)::text, item
   FROM  (
      SELECT row_number() OVER () - 1 AS rn, trim(item) AS item
      FROM (
         SELECT CASE WHEN rn%2 = 1 THEN regexp_split_to_table(item, ',') 
                     ELSE item END AS item
         FROM  (
            SELECT row_number() OVER () AS rn, *
            FROM regexp_split_to_table('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z', '\|') AS item
            ) x
         ) y
      ) z
   $$)

Returns:

 row_name | category_1 |   category_2   | category_3
----------+------------+----------------+------------
 0        | Test 1     | new york       | X
 1        | Test 2     | chicago        | Y
 2        | Test 3     | harrisburg, pa | Z

After splitting the string at |, I build on the criterion that only lines with uneven row number shall be split at ,.
I trim() the results and add derivatives of another row_number() to arrive at this intermediary state before doing the cross tabulation:

 x | text |      item
---+------+----------------
 0 | 0    | Test 1
 0 | 1    | new york
 0 | 2    | X
 1 | 0    | Test 2
 1 | 1    | chicago
 1 | 2    | Y
 2 | 0    | Test 3
 2 | 1    | harrisburg, pa
 2 | 2    | Z

Finally, I apply the crosstab3() function from the tablefunc module. To install it, if you haven't already:

CREATE EXTENSION tablefunc;

Pre-process with regexp_replace()

Here is an alternative that may be easier to comprehend. Not sure which is faster. Complex regular expressions tend to be expensive:

SELECT trim(split_part(a,'|', 1)) AS column1
      ,trim(split_part(a,'|', 2)) AS column2
      ,trim(split_part(a,'|', 3)) AS column3
FROM  (
   SELECT unnest(
             string_to_array(
                         regexp_replace('Test 1|new york| X, Test 2| chicago|Y, Test 3| harrisburg, pa| Z'
                        ,'([^|]*\|[^|]*\|[^,]*),', '\1~^~', 'g'), '~^~')) AS a
   ) sub

This one replaces commas (,) only after two pipes (|), before proceeding.
Now using * instead of + to allow for empty strings between the pipes.

Sign up to request clarification or add additional context in comments.

5 Comments

Please check your answer. I've tried that, but result it's different.<pre> 0 T e s 1 t 1 2 | n e 3 w y 4 o r k 5 | X 6 7 T e s 8 t 2 9 | c 10 h i c 11 a g o 12 | Y , 13 T e 14 s t 15 3 | 16 h a r 17 r i s 18 b u r 19 g , 20 p a | 21 Z </pre>
@LuigiSaggese: I have tested with PostgreSQL 9.1 before I posted. I can reproduce my result. You may need to set standard_conforming_strings = on - which is on by default in 9.1.
@LuigiSaggese: It's a kind of ... ;)
Last Solution it's amazing! Could you "repair" it for cases like this '||,' or 'Magic||' ?
@LuigiSaggese: I believe I did now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.