3

i have this code

pd.unique(df_dataset["City"])

then this output comes out

array(['Marseile', 'Barcelona', 'Valencia', 'Paris', 'Berlin', 'Lyon',
       'Seville', 'Palma', 'Munich', 'Hamburg', 'Madrid', 'Nice',
       'Granada'], dtype=object)

how do i add sort() function in the code

I have tried to run this

pd.unique(df_dataset["City"]).sorted("City", key=True)

but it doesn't seems correct

1
  • As I commented on @BENY's answer, sorting the whole column before taking the unique values is much less efficient than sorting after. Can you please test both on your data and give feedback? Commented Aug 1, 2021 at 4:49

2 Answers 2

3

Let us just with pandas

df_dataset["City"].sort_values().unique()
Sign up to request clarification or add additional context in comments.

1 Comment

It will be more expensive to sort first if there are many rows with duplicates, compared to a sort on unique values. You can test on something like pd.Series(np.random.choice(list('ABC'), 1000000)) to see for yourself (factor 10 in this case).
1

What about:

sorted(df_dataset["City"].unique())

If you want to keep the numpy.array type:

import numpy as np
np.sort(df_dataset["City"].unique())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.