sorting dataframe by string values

Question

I have dataframe which looks like this:

Name  Net Worth
A     100M
B     200M
C     5M
D     40M
E     10B
F     2B

I would like to sort it by values in Net Worth column, what would be most optimal way to sort values lie this? M means million and B means billion so 10B would be the highest value.

jezrael · Accepted Answer · 2017-04-05 17:01:35Z

2

You can use replace, create new sorted Series and then reindex original:

d = {'M': '0'*6, 'B': '0'*9}
s = df['Net Worth'].replace(d, regex=True).astype(float).sort_values(ascending=False)
print (df.reindex(s.index))
  Name Net Worth
4    E       10B
5    F        2B
1    B      200M
0    A      100M
3    D       40M
2    C        5M

More general solution, if some floats are in data:

print (df)
  Name Net Worth
0    A         1
1    B      200M
2    C        5M
3    D       40M
4    E      1.0B
5    F        2B

#dict for multiple
d = {'M': 10**6, 'B': 10**9}
#all keys of dict separated by | (or)
k = '|'.join(d.keys())

#replace by dict
a = df['Net Worth'].replace(d, regex=True).astype(float)
#remove M,B
b = df['Net Worth'].replace([k], '', regex=True).astype(float)
#multiple together, sorts
s = a.mul(b).sort_values(ascending=False)
#reindex - get sorted original
print (df.reindex(s.index))
  Name Net Worth
5    F        2B
4    E      1.0B
1    B      200M
3    D       40M
2    C        5M
0    A         1

And another similar solution with extract:

#dict for replace
_prefix = {'k': 1e3,    # kilo
           'M': 1e6,    # mega
           'B': 1e9,    # giga
}
#all keys of dict separated by | (or)
k = '|'.join(_prefix.keys())
#extract values to new df
df1 = df['Net Worth'].str.extract('(?P<a>[0-9.]*)(?P<b>' + k +')*', expand=True)
#convert numeric column to float
df1.a = df1.a.astype(float)
#map values by dictionary, replace NaN (no prefix) to 1
df1.b = df1.b.map(_prefix).fillna(1)
#multiple columns together
s = df1.a.mul(df1.b).sort_values(ascending=False)
print (s)
#sorting by reindexing
print (df.reindex(s.index))
  Name Net Worth
5    F        2B
4    E      1.0B
1    B      200M
3    D       40M
2    C        5M
0    A         1

edited Apr 5, 2017 at 17:01

answered Apr 5, 2017 at 14:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alex T Over a year ago

Thanks! I have one questions how regex=True works in this case? Is it simiar to df.str.replace()?

jezrael Over a year ago

Yes, it is similar, but better working with dict. And if need replace by substring, need regex=True

Collectives™ on Stack Overflow

sorting dataframe by string values

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related