1

I'm using Postgresql 8.0.2 Redshift

I have a bunch of names in a database that look like this stored under the value full_name:

full_name
Jeff Davis
Michael Scott
Eric Wilson
Anna Jones-Porter
Lisa Marie Scott
A.J. Jackson

What I'm trying to do is create two new fields for first_name and last_name.

I initially tried the following query:

SELECT 
full_name,
REGEXP_SUBSTR(full_name, '^\\w+') first_name,
REGEXP_SUBSTR(full_name, ' \\w+') last_name
FROM 
schema.table

But that only handles the first three records because it only accounts for people who have two character strings in their full_name separated by one space. So it doesn't work for the bottom three records.

What would it look like if I wanted to write a query that properly captures ALL these variations of first and last names? For ones like Lisa Marie Scott the first_name would be Lisa and last_name would be Scott.

Expected results would look like:

full_name   first_name  last_name
Jeff Davis  Jeff    Davis
Michael Scott   Michael Scott
Eric Wilson Eric    Wilson
Anna Jones-Porter   Anna    Jones-Porter
Lisa Marie Scott    Lisa    Scott
A.J. Jackson    A.J.    Jackson

1 Answer 1

0

If you put the DDL and insert statements for the test data it would help who ever is answering the question Anyways here is how you can do it.

  • Use regexp_split_to_array will split the full name into an array of words
  • Then you take the first occurrence as first name and last occurrence as last name.
  • To find the last occurrence use the array_length function

x

CREATE TABLE TEST (
    full_name VARCHAR(100)
) ;

INSERT INTO TEST VALUES 
    ('Jeff Davis'),
    ('Michael Scott'),
    ('Eric Wilson'),
    ('Anna Jones-Porter'),
    ('Lisa Marie Scott'),
    ('A.J. Jackson');

-- this was from V14
select full_name, 
       name_array[1] as first_name, 
       name_array[array_length(name_array, 1)] as last_name
from (
    SELECT full_name, 
           regexp_split_to_array(full_name, '\s+') as name_array
    from TEST
) as x

-- this was from V8
select full_name, name_array[1] as first_name, name_array[(array_upper(name_array, 1))]
from (
    SELECT full_name, string_to_array(full_name, ' ') as name_array
    from test
) as x

-- redshift solution
select full_name, name_array[0]::VARCHAR as first_name, name_array[(get_array_length(name_array)-1)]::VARCHAR as last_name
from (
    SELECT full_name, split_to_array(full_name, ' ') as name_array
    from schema_poc.test
) as x 


**full_name         first_name  name_array**
Jeff Davis          Jeff        Davis
Michael Scott       Michael     Scott
Eric Wilson         Eric        Wilson
Anna Jones-Porter   Anna        Jones-Porter
Lisa Marie Scott    Lisa        Scott
A.J. Jackson        A.J.        Jackson
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for answering. But I tried it and got this response back: Status: Failed Error in SQL: function regexp_split_to_array(character varying, "unknown") does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts. For the record, I'm using Postgresql 8.0.2
updated above for Version 8..
Weird. I just ran it and got this message: Error in SQL: Specified types or functions (one per INFO message) not supported on Redshift tables.
I guess you need to make it clear on which database you are using.. Postgres or redshift??
My apologies. It is a Postgres Redshift.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.