1

Let's say I have a dataframe as follows:

df = pd.DataFrame({"id": range(4), "price": ["15dollar/m2/day", "90dollar/m2/month", "18dollar/m2/day", "100dollar/m2/month"]})

       id               price
    0   0     15dollar/m2/day
    1   1   90dollar/m2/month
    2   2     18dollar/m2/day
    3   3  100dollar/m2/month

I would like to split column price into two new columns: unit_price and price_unit as below:

   id     unit_price  price_unit
0   0        15.0    dollar/m2/day
1   1        90.0    dollar/m2/month
2   2        18.0    dollar/m2/day
3   3       100.0    dollar/m2/month

This is my solution:

df['unit_price'] = df['price'].str.split('dollar').str[0].astype(float)
#df['unit_price'] = df['price'].str.extract('(\d*\.\d+|\d+)', expand=False).astype(float)
df['price_unit'] = df['price'].str.split('dollar').str[1]
del df['price']

For the column unit_price, it works fine, but for price_unit, when i split by dollar, I got result as below which doesn't including character dollar, or if I use df['price'].str.replace(r'\d', ''), all numbers were removed. How could I do it correctly in Python? Thanks.

df['price_unit']
Out[474]: 
0      /m2/day
1    /m2/month
2      /m2/day
3    /m2/month
Name: price_unit, dtype: object 
0

2 Answers 2

3

You can use Series.str.extract with regex - ^ for start of string, \d*\.\d+ for floats or \d+ for integers and then for all another values by .*:

df = df.join(df.pop('price').str.extract('(?P<unit_price>^\d*\.\d+|^\d+)(?P<price_unit>.*)'))
print (df)
   id unit_price       price_unit
0   0         15    dollar/m2/day
1   1         90  dollar/m2/month
2   2         18    dollar/m2/day
3   3        100  dollar/m2/month

First solution is using extract and replace by numeric:

pat = '(^\d*\.\d+|^\d+)'
df['unit_price'] = df['price'].str.extract(pat, expand=False)
df['price_unit'] = df.pop('price').str.replace(pat,'')
print (df)
   id unit_price       price_unit
0   0         15    dollar/m2/day
1   1         90  dollar/m2/month
2   2         18    dollar/m2/day
3   3        100  dollar/m2/month
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. May I ask what ?P stand for?
@ahbon - sure, if check this - it is for new columns names
0

If it's just a string, why not keep it simple and add the dollar part yourself?

df['price_unit'] = 'dollar' + df['price'].str.split('dollar').str[1]

import pandas as pd

df = pd.DataFrame({"id": range(4), "price": ["15dollar/m2/day", "90dollar/m2/month", "18dollar/m2/day", "100dollar/m2/month"]})

df['unit_price'] = df['price'].str.split('dollar').str[0].astype(float)
df['price_unit'] = 'dollar' + df['price'].str.split('dollar').str[1]

del df['price']

print(df)

   id  unit_price       price_unit
0   0        15.0    dollar/m2/day
1   1        90.0  dollar/m2/month
2   2        18.0    dollar/m2/day
3   3       100.0  dollar/m2/month

2 Comments

Thanks it works in this situation. But I think @jezrael's solution is more general in case if there are other money unit yen, euro, etc.
ah, yes. If the currency can change, that is indeed a more general solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.