Postgres function to split word to arrays with extra logic

Question

I've been playing around on psql and splitting a name in to an array, so for example:

select string_to_array('joseph jones', ' ');
string_to_array 
-----------------
{joseph,jones}

This works exactly as I expected.

However, my dataset contains a lot of surnames that have a preceding 'o'.

select string_to_array('joseph o carroll', ' ');
string_to_array 
-----------------
{joseph,o,carroll}

Is there any way I can add some extra logic so that if a word is preceded by a ' o ' then it gets bundled in to the following word?

So joseph o carroll would return {joseph,o carroll}

So far I have select regexp_split_to_array('joseph o jones','(\s+)'); but still trying to figure out how to exclude the o from the split — nimgwfc
– nimgwfc, Commented Oct 21, 2020 at 10:40
Ok, so I now have this select regexp_split_to_array('joseph o jones','(?<!o)(\s+)'); which nearly solves my problem but for some reason adds quotation marks around o jones — nimgwfc
– nimgwfc, Commented Oct 21, 2020 at 11:12
The quotes are normal because the result is an array, and if an array element contains a space it will be quoted when the array is displayed. If you access the individual elements, the quotes won't be there, e.g. select unnest(regexp_split_to_array(..)) — user330315
– user330315, Commented Oct 21, 2020 at 11:12

nimgwfc · Accepted Answer · 2020-10-21 11:24:33Z

2

From playing around with regex, I think I have found a solution:

select regexp_split_to_array('joseph o jones','(?<!o)(\s+)');

answered Oct 21, 2020 at 11:24

nimgwfc

1,5792 gold badges20 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ryszard Czech · Accepted Answer · 2020-10-21 21:34:10Z

You can't use a mere (?<!o)\s+, try it against romeo bones. As first name ends in o, the expression does not match.

Use

select regexp_split_to_array('joseph o jones','(?<!\yo)\s+');

Explanation

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \y                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    o                        'o'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))

Collectives™ on Stack Overflow

Postgres function to split word to arrays with extra logic

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related