Capturing variations of first and last name using postgresql

Question

I'm using Postgresql 8.0.2 Redshift

I have a bunch of names in a database that look like this stored under the value full_name:

full_name
Jeff Davis
Michael Scott
Eric Wilson
Anna Jones-Porter
Lisa Marie Scott
A.J. Jackson

What I'm trying to do is create two new fields for first_name and last_name.

I initially tried the following query:

SELECT 
full_name,
REGEXP_SUBSTR(full_name, '^\\w+') first_name,
REGEXP_SUBSTR(full_name, ' \\w+') last_name
FROM 
schema.table

But that only handles the first three records because it only accounts for people who have two character strings in their full_name separated by one space. So it doesn't work for the bottom three records.

What would it look like if I wanted to write a query that properly captures ALL these variations of first and last names? For ones like Lisa Marie Scott the first_name would be Lisa and last_name would be Scott.

Expected results would look like:

full_name   first_name  last_name
Jeff Davis  Jeff    Davis
Michael Scott   Michael Scott
Eric Wilson Eric    Wilson
Anna Jones-Porter   Anna    Jones-Porter
Lisa Marie Scott    Lisa    Scott
A.J. Jackson    A.J.    Jackson

Anand K · Accepted Answer · 2022-08-10 14:58:26Z

0

If you put the DDL and insert statements for the test data it would help who ever is answering the question Anyways here is how you can do it.

Use regexp_split_to_array will split the full name into an array of words
Then you take the first occurrence as first name and last occurrence as last name.
To find the last occurrence use the array_length function

x

CREATE TABLE TEST (
    full_name VARCHAR(100)
) ;

INSERT INTO TEST VALUES 
    ('Jeff Davis'),
    ('Michael Scott'),
    ('Eric Wilson'),
    ('Anna Jones-Porter'),
    ('Lisa Marie Scott'),
    ('A.J. Jackson');

-- this was from V14
select full_name, 
       name_array[1] as first_name, 
       name_array[array_length(name_array, 1)] as last_name
from (
    SELECT full_name, 
           regexp_split_to_array(full_name, '\s+') as name_array
    from TEST
) as x

-- this was from V8
select full_name, name_array[1] as first_name, name_array[(array_upper(name_array, 1))]
from (
    SELECT full_name, string_to_array(full_name, ' ') as name_array
    from test
) as x

-- redshift solution
select full_name, name_array[0]::VARCHAR as first_name, name_array[(get_array_length(name_array)-1)]::VARCHAR as last_name
from (
    SELECT full_name, split_to_array(full_name, ' ') as name_array
    from schema_poc.test
) as x 


**full_name         first_name  name_array**
Jeff Davis          Jeff        Davis
Michael Scott       Michael     Scott
Eric Wilson         Eric        Wilson
Anna Jones-Porter   Anna        Jones-Porter
Lisa Marie Scott    Lisa        Scott
A.J. Jackson        A.J.        Jackson

edited Aug 10, 2022 at 14:58

answered Aug 10, 2022 at 0:40

Anand K

2834 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

wizkids121 Over a year ago

Thanks for answering. But I tried it and got this response back:

Status: Failed  Error in SQL: function regexp_split_to_array(character varying, "unknown") does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts.

For the record, I'm using Postgresql 8.0.2

Anand K Over a year ago

updated above for Version 8..

wizkids121 Over a year ago

Weird. I just ran it and got this message: Error in SQL: Specified types or functions (one per INFO message) not supported on Redshift tables.

Anand K Over a year ago

I guess you need to make it clear on which database you are using.. Postgres or redshift??

wizkids121 Over a year ago

My apologies. It is a Postgres Redshift.

|

Collectives™ on Stack Overflow

Capturing variations of first and last name using postgresql

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related