2

I've been playing around on psql and splitting a name in to an array, so for example:

select string_to_array('joseph jones', ' ');
string_to_array 
-----------------
{joseph,jones}

This works exactly as I expected.

However, my dataset contains a lot of surnames that have a preceding 'o'.

select string_to_array('joseph o carroll', ' ');
string_to_array 
-----------------
{joseph,o,carroll}

Is there any way I can add some extra logic so that if a word is preceded by a ' o ' then it gets bundled in to the following word?

So joseph o carroll would return {joseph,o carroll}

5
  • 3
    Maybe a regex and regexp_split_to_array could do that. Commented Oct 21, 2020 at 9:37
  • So far I have select regexp_split_to_array('joseph o jones','(\s+)'); but still trying to figure out how to exclude the o from the split Commented Oct 21, 2020 at 10:40
  • Ok, so I now have this select regexp_split_to_array('joseph o jones','(?<!o)(\s+)'); which nearly solves my problem but for some reason adds quotation marks around o jones Commented Oct 21, 2020 at 11:12
  • The quotes are normal because the result is an array, and if an array element contains a space it will be quoted when the array is displayed. If you access the individual elements, the quotes won't be there, e.g. select unnest(regexp_split_to_array(..)) Commented Oct 21, 2020 at 11:12
  • Ah yes, that makes sense! thanks Commented Oct 21, 2020 at 11:23

2 Answers 2

2

From playing around with regex, I think I have found a solution:

select regexp_split_to_array('joseph o jones','(?<!o)(\s+)');

Sign up to request clarification or add additional context in comments.

Comments

2

You can't use a mere (?<!o)\s+, try it against romeo bones. As first name ends in o, the expression does not match.

Use

select regexp_split_to_array('joseph o jones','(?<!\yo)\s+');

Explanation

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \y                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    o                        'o'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.