1

i have a dataframe

Name
Joe Smith
Jane Doe
Homer Simpson

i am trying to format this to get to

Name
Smith, Joe
Doe, Jane
Simpson, Homer

i have this code, and it works for ~ 80% of users in my list but some users are not coming through right.

invalid_users = ['Test User', 'Test User2', 'Test User3']


for index, row in df_Users.iterrows():
    gap_pos = df_Users["Name"][index].find(" ") 
    if gap_pos > 0 and row["Name"] not in invalid_users:
        row["Name"] = df_Users["Name"][index][len(df_Users["Name"][index])-gap_pos+1:].strip() +', ' + df_Users["Name"][index][:gap_pos]

the users who are not coming through correctly, usually their last name is truncated somewhere - i.e. Simpson ==> mpson

What am I doing wrong here?

3
  • See Using Regex to change the name values format in a dataframe Commented Jul 21, 2021 at 20:47
  • 1
    if you still want this loop, use split instead, "Joe Smith".split(" ") will give you a list like this ['Joe', 'Smith'] Commented Jul 21, 2021 at 20:48
  • Why [len(df_Users["Name"][index])-gap_pos+1:]? A simple [gap_pos+1:] should do (but better use one of the mentioned alternatives anyway). Commented Jul 21, 2021 at 20:52

3 Answers 3

2

Just split on space, then reverse it (that's what .str[::-1] is doing) and join on , :

>>> df['Name'].str.split(' ').str[::-1].str.join(', ')
0        Smith, Joe
1         Doe, Jane
2    Simpson, Homer
Name: Name, dtype: object

And if your data contains the name like Jr. Joe Smith, then you may do it following way:

df['Name'].str.split(' ').str[::-1].apply(lambda x:(x[0],' '.join(x[1:]))).str.join(', ')
Sign up to request clarification or add additional context in comments.

4 Comments

you got the first and last name flipped
Yes, but you're not putting the last name first.
I like this answer better, but it's worth pointing out that "Jill St. John" would become "John, St., Jill", which is not desired. Still, I think this gives the right skeleton. It depends on what his data looks like.
Yes sir, you are right @TimRoberts, it depends on the data.
0

I'm not sure what you were trying to with len there, but it's not right. You just want to start straight from gap_pos:

row["Name"] = df_Users["Name"][index][gap_pos+1:].strip() +', ' + df_Users["Name"][index][:gap_pos]

I would be tempted to use split for this.

Comments

-1

Pandas is a library that takes profit of vectorial operations, especially for simple transformations and most of DataFrame manipulations.

Given your example, here is a code that would work:

import pandas as pd

df = pd.DataFrame({"name": ["Joe Smith", "Jane Doe", "Homer Simpson"]})
# df
#              name
# 0       Joe Smith
# 1        Jane Doe
# 2   Homer Simpson

df["name"] = df["name"].apply(lambda x: f"{x.split(' ')[1]}, {x.split(' ')[0]}")
# df
#               name
# 0       Smith, Joe
# 1        Doe, Jane
# 2   Simpson, Homer 

The apply function takes every row and applies the specified function to each one of them. Here, the specified function is a lambda function that, supposing the name pattern is "FirstName LastName", does what you want.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.