9

Given a DataFrame:

    name             email
0   Carl    [email protected]
1    Bob     [email protected]
2  Alice   [email protected]
3  David  [email protected]
4    Eve     [email protected]

How can it be sorted according to the email's domain name (alphabetically, ascending), and then, within each domain group, according to the string before the "@"?

The result of sorting the above should then be:

    name             email
0    Bob     [email protected]
1    Eve     [email protected]
2  David  [email protected]
3  Alice   [email protected]
4   Carl    [email protected]

2 Answers 2

7

Use:

df = df.reset_index(drop=True)
idx = df['email'].str.split('@', expand=True).sort_values([1,0]).index
df = df.reindex(idx).reset_index(drop=True)
print (df)
    name             email
0    Bob     [email protected]
1    Eve     [email protected]
2  David  [email protected]
3  Alice   [email protected]
4   Carl    [email protected]

Explanation:

  1. First reset_index with drop=True for unique default indices
  2. Then split values to new DataFrame and sort_values
  3. Last reindex to new order
Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant! Just awesome! This is a great answer.
3

Option 1
sorted + reindex

df = df.set_index('email')
df.reindex(sorted(df.index, key=lambda x: x.split('@')[::-1])).reset_index()

              email   name
0     [email protected]    Bob
1     [email protected]    Eve
2  [email protected]  David
3   [email protected]  Alice
4    [email protected]   Carl

Option 2
sorted + pd.DataFrame
As an alternative, you can ditch the reindex call from Option 1 by re-creating a new DataFrame.

pd.DataFrame(
    sorted(df.values, key=lambda x: x[1].split('@')[::-1]), 
    columns=df.columns
)

    name             email
0    Bob     [email protected]
1    Eve     [email protected]
2  David  [email protected]
3  Alice   [email protected]
4   Carl    [email protected]

4 Comments

Is it possible to do Option 2 with the column name instead of x[1]? I tried x["email"] and I get an error.
@IamTheWalrus No, it isn't possible. sorted operates on df.values, and those can only be indexed with integers.
@COLDSPEED Thanks. I managed to get the column index with df.columns.get_loc("email") and use it with your solution. I prefer it since my actual dataframe reads many columns from a csv and I sometimes change which are included in the dataframe and in which order.
@IamTheWalrus very innovative! Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.