I have a column name that represents a person's name in the following format:
firstname [middlename] lastname [, Sr.|Jr.]
For, example:
John Smith
John J. Smith
John J. Smith, Sr.
How can I order items by lastname?
I have a column name that represents a person's name in the following format:
firstname [middlename] lastname [, Sr.|Jr.]
For, example:
John Smith
John J. Smith
John J. Smith, Sr.
How can I order items by lastname?
A correct and faster version could look like this:
SELECT *
FROM tbl
ORDER BY substring(name, '([^[:space:]]+)(?:,|$)')
Or:
ORDER BY substring(name, E'([^\\s]+)(?:,|$)')
Or even:
ORDER BY substring(name, E'([^\\s]+)(,|$)')
[^[:space:]]+ .. first (and longest) string consisting of one or more non-whitespace characters.
(,|$) .. terminated by a comma or the end of the string.
The last two examples use escape-string syntax and the class-shorthand \s instead of the long form [[:space:]] (which loses the outer level of brackets when inside a character class).
We don't actually have to use non-capturing parenthesis (?:) after the part we want to extract, because (quoting the manual):
.. if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned.
SELECT substring(name, '([^[:space:]]+)(?:,|$)')
FROM (VALUES
('John Smith')
,('John J. Smith')
,('John J. Smith, Sr.')
,('foo bar Smith, Jr.')
) x(name)
SELECT *
FROM t
ORDER BY substring(name, E'^.*\\s([^\\s]+)(?=,|$)') ASC
While this should provide the sorting you are looking for, it would be a lot cheaper to store the name in multiple columns and index them based on which parts of the name you need to sort by.
SELECT substring('John J. Smith, Sr.', E'^.*\\s([^\\s]+)(?=,|$)'). I posted a version that works.You should use functional index for this purpose http://www.postgresql.org/docs/7.3/static/indexes-functional.html
In your case somehow....
CREATE INDEX test1_lastname_col1_idx ON test1 (split_part(col1, ' ', 3));
SELECT * FROM test1 ORDER BY split_part(col1, ' ', 3);
lastname isn't always the third element.