Formatting strings in a dataframe

Question

i have a dataframe

Name
Joe Smith
Jane Doe
Homer Simpson

i am trying to format this to get to

Name
Smith, Joe
Doe, Jane
Simpson, Homer

i have this code, and it works for ~ 80% of users in my list but some users are not coming through right.

invalid_users = ['Test User', 'Test User2', 'Test User3']


for index, row in df_Users.iterrows():
    gap_pos = df_Users["Name"][index].find(" ") 
    if gap_pos > 0 and row["Name"] not in invalid_users:
        row["Name"] = df_Users["Name"][index][len(df_Users["Name"][index])-gap_pos+1:].strip() +', ' + df_Users["Name"][index][:gap_pos]

the users who are not coming through correctly, usually their last name is truncated somewhere - i.e. Simpson ==> mpson

What am I doing wrong here?

See Using Regex to change the name values format in a dataframe — MDR
– MDR, Commented Jul 21, 2021 at 20:47
if you still want this loop, use split instead, "Joe Smith".split(" ") will give you a list like this ['Joe', 'Smith'] — Da Song
– Da Song, Commented Jul 21, 2021 at 20:48
Why [len(df_Users["Name"][index])-gap_pos+1:]? A simple [gap_pos+1:] should do (but better use one of the mentioned alternatives anyway). — Michael Butscher
– Michael Butscher, Commented Jul 21, 2021 at 20:52

ThePyGuy · Accepted Answer · 2021-07-21 21:09:18Z

2

Just split on space, then reverse it (that's what .str[::-1] is doing) and join on , :

>>> df['Name'].str.split(' ').str[::-1].str.join(', ')
0        Smith, Joe
1         Doe, Jane
2    Simpson, Homer
Name: Name, dtype: object

And if your data contains the name like Jr. Joe Smith, then you may do it following way:

df['Name'].str.split(' ').str[::-1].apply(lambda x:(x[0],' '.join(x[1:]))).str.join(', ')

edited Jul 21, 2021 at 21:09

answered Jul 21, 2021 at 20:49

ThePyGuy

18.5k5 gold badges24 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Da Song Over a year ago

you got the first and last name flipped

Tim Roberts Over a year ago

Yes, but you're not putting the last name first.

Tim Roberts Over a year ago

I like this answer better, but it's worth pointing out that "Jill St. John" would become "John, St., Jill", which is not desired. Still, I think this gives the right skeleton. It depends on what his data looks like.

ThePyGuy Over a year ago

Yes sir, you are right @TimRoberts, it depends on the data.

ThePyGuy · Accepted Answer · 2021-08-08 06:21:51Z

0

I'm not sure what you were trying to with len there, but it's not right. You just want to start straight from gap_pos:

row["Name"] = df_Users["Name"][index][gap_pos+1:].strip() +', ' + df_Users["Name"][index][:gap_pos]

I would be tempted to use split for this.

edited Aug 8, 2021 at 6:21

ThePyGuy

18.5k5 gold badges24 silver badges55 bronze badges

answered Jul 21, 2021 at 20:49

Tim Roberts

55.3k4 gold badges28 silver badges41 bronze badges

Comments

pierre_loic · Accepted Answer · 2021-07-21 20:57:13Z

Pandas is a library that takes profit of vectorial operations, especially for simple transformations and most of DataFrame manipulations.

Given your example, here is a code that would work:

import pandas as pd

df = pd.DataFrame({"name": ["Joe Smith", "Jane Doe", "Homer Simpson"]})
# df
#              name
# 0       Joe Smith
# 1        Jane Doe
# 2   Homer Simpson

df["name"] = df["name"].apply(lambda x: f"{x.split(' ')[1]}, {x.split(' ')[0]}")
# df
#               name
# 0       Smith, Joe
# 1        Doe, Jane
# 2   Simpson, Homer

The apply function takes every row and applies the specified function to each one of them. Here, the specified function is a lambda function that, supposing the name pattern is "FirstName LastName", does what you want.

Collectives™ on Stack Overflow

Formatting strings in a dataframe

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related