15

I have a DataFrame in Python with a column with names (such as Joseph Haydn, Wolfgang Amadeus Mozart, Antonio Salieri and so forth).

I want to get a new column with the last names: Haydn, Mozart, Salieri and so forth.

I know how to split a string, but I could not find a way to apply it to a series, or a Data Frame column.

2
  • 1
    column.str.split. Add some example code, and you will likely get an answer. Commented Sep 6, 2015 at 15:53
  • Falsehoods Programmers Believe About Names Commented Jan 28, 2023 at 0:39

2 Answers 2

32

if you have:

import pandas
data = pandas.DataFrame({"composers": [ 
    "Joseph Haydn", 
    "Wolfgang Amadeus Mozart", 
    "Antonio Salieri",
    "Eumir Deodato"]})

assuming you want only the first name (and not the middle name like Amadeus):

data.composers.str.split('\s+').str[0]

will give:

0      Joseph
1    Wolfgang
2     Antonio
3       Eumir
dtype: object

you can assign this to a new column in the same dataframe:

data['firstnames'] = data.composers.str.split('\s+').str[0]

Last names would be:

data.composers.str.split('\s+').str[-1]

which gives:

0      Haydn
1     Mozart
2    Salieri
3    Deodato
dtype: object

(see also Python Pandas: selecting element in array column for accessing elements in an 'array' column)

For all but the last names you can apply " ".join(..) to all but the last element ([:-1]) of each row:

data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))

which gives:

0              Joseph
1    Wolfgang Amadeus
2             Antonio
3               Eumir
dtype: object
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Andre. I have almost arrived to the same solution, but yours is more elegant. In any case I was intrigued by the double use of "str" in "data.composers.str.split('\s+').str[-1]". Would never be able to deduce that by logic alone. Thanks anyway.
I arrived at this solution iteratively, e.g. by googling 'pandas dataframe strings' I found pandas.pydata.org/pandas-docs/stable/text.html where I searched for split (incidentally, you'll also find an example about split when you do help(data.composers) (after the variable data has been defined as above). The second part (accessing elements of columns whose entries are lists themselves) I found in the linked answer stackoverflow.com/questions/26069235/…
I don't think you need '\s+'. That's the default of split().
-1

Try this to solve your problem:

import pandas as pd
df = pd.DataFrame(
    {'composers':
        [ 
            'Joseph Haydn', 
            'Wolfgang Amadeus Mozart', 
            'Antonio Salieri',
            'Eumir Deodato',
        ]
    }
)

df['lastname'] = df['composers'].str.split(n = 0, expand = False).str[1]

You can now find the DataFrame, as shown below.

composers   lastname
0   Joseph Haydn    Haydn
1   Wolfgang Amadeus Mozart Amadeus Mozart
2   Antonio Salieri Salieri
3   Eumir Deodato   Deodato

1 Comment

str[1] is the wrong index. It just appears to work on this cherry-picked input, but breaks on others. If your df has Mozart first, it gives "Amadeus" for that column rather than "Mozart". Better: df['composers'].str.split().str[-1] but then it's basically the same as the existing answer, so I don't think this answer adds value even if fixed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.