0

I'm having some problem extracting the last name from a list.

list = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski']

for item in list:
    print(item)
    print(re.findall(r'(\s(.*))', item))

But the output is as such:

Cristiano Ronaldo
[(' Ronaldo', 'Ronaldo')]
L. Messi
[(' Messi', 'Messi')]
M. Neuer
[(' Neuer', 'Neuer')]
L. Suarez
[(' Suarez', 'Suarez')]
De Gea
[(' Gea', 'Gea')]
Z. Ibrahimovic
[(' Ibrahimovic', 'Ibrahimovic')]
G. Bale
[(' Bale', 'Bale')]
J. Boateng
[(' Boateng', 'Boateng')]
R. Lewandowski
[(' Lewandowski', 'Lewandowski')]

I am curious as to why the last names were returned twice; I only want to get back the last names once.

Can any of you kind folks help? Thank you!

2
  • You have 2 nested groups, one that includes the space and one that doesn't. Your regex wouldn't handle the case where middle names were included? Why not split the string and return the last element? Commented Dec 23, 2019 at 7:41
  • You are capturing two groups. I would do it like this. \w+$ Commented Dec 23, 2019 at 7:41

4 Answers 4

3

Use str.split() with negative indexing

Ex:

lst = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski']

for item in lst:
    print(item)
    print(item.split()[-1])

Output:

Ronaldo
Messi
Neuer
Suarez
Gea
Ibrahimovic
Bale
Boateng
Lewandowski
Sign up to request clarification or add additional context in comments.

Comments

3

You create 2 group with the two pairs of brackets. Remove the outer one and you will get only the last name:

list = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski'] 
for item in list: 
    print(item) 
    print(re.findall(r'\s(.*)', item))

Comments

1

\S matches any character that is not a space.

list = ['Cristiano Ronaldo', 'L. Messi', 'M. Neuer', 'L. Suarez', 'De Gea', 'Z. Ibrahimovic', 'G. Bale', 'J. Boateng', 'R. Lewandowski']

for item in list:
    print(item)
    print(re.findall(r'\S+$', item)) # match 1 or more non space before end of string

Output:

Cristiano Ronaldo
['Ronaldo']
L. Messi
['Messi']
M. Neuer
['Neuer']
L. Suarez
['Suarez']
De Gea
['Gea']
Z. Ibrahimovic
['Ibrahimovic']
G. Bale
['Bale']
J. Boateng
['Boateng']
R. Lewandowski
['Lewandowski']

Comments

0

Check this out https://regex101.com/r/CGrruO/1

You can see that your regex returns 2 matches.
You added another set of () so you got two matches, one with space and one without.

Changing to \s(.*) should work

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.