1

I have dataframe which looks like this:

Name  Net Worth
A     100M
B     200M
C     5M
D     40M
E     10B
F     2B

I would like to sort it by values in Net Worth column, what would be most optimal way to sort values lie this? M means million and B means billion so 10B would be the highest value.

1 Answer 1

2

You can use replace, create new sorted Series and then reindex original:

d = {'M': '0'*6, 'B': '0'*9}
s = df['Net Worth'].replace(d, regex=True).astype(float).sort_values(ascending=False)
print (df.reindex(s.index))
  Name Net Worth
4    E       10B
5    F        2B
1    B      200M
0    A      100M
3    D       40M
2    C        5M

More general solution, if some floats are in data:

print (df)
  Name Net Worth
0    A         1
1    B      200M
2    C        5M
3    D       40M
4    E      1.0B
5    F        2B

#dict for multiple
d = {'M': 10**6, 'B': 10**9}
#all keys of dict separated by | (or)
k = '|'.join(d.keys())

#replace by dict
a = df['Net Worth'].replace(d, regex=True).astype(float)
#remove M,B
b = df['Net Worth'].replace([k], '', regex=True).astype(float)
#multiple together, sorts
s = a.mul(b).sort_values(ascending=False)
#reindex - get sorted original
print (df.reindex(s.index))
  Name Net Worth
5    F        2B
4    E      1.0B
1    B      200M
3    D       40M
2    C        5M
0    A         1

And another similar solution with extract:

#dict for replace
_prefix = {'k': 1e3,    # kilo
           'M': 1e6,    # mega
           'B': 1e9,    # giga
}
#all keys of dict separated by | (or)
k = '|'.join(_prefix.keys())
#extract values to new df
df1 = df['Net Worth'].str.extract('(?P<a>[0-9.]*)(?P<b>' + k +')*', expand=True)
#convert numeric column to float
df1.a = df1.a.astype(float)
#map values by dictionary, replace NaN (no prefix) to 1
df1.b = df1.b.map(_prefix).fillna(1)
#multiple columns together
s = df1.a.mul(df1.b).sort_values(ascending=False)
print (s)
#sorting by reindexing
print (df.reindex(s.index))
  Name Net Worth
5    F        2B
4    E      1.0B
1    B      200M
3    D       40M
2    C        5M
0    A         1
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! I have one questions how regex=True works in this case? Is it simiar to df.str.replace()?
Yes, it is similar, but better working with dict. And if need replace by substring, need regex=True

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.