Let's say I have a dataframe as follows:
df = pd.DataFrame({"id": range(4), "price": ["15dollar/m2/day", "90dollar/m2/month", "18dollar/m2/day", "100dollar/m2/month"]})
id price
0 0 15dollar/m2/day
1 1 90dollar/m2/month
2 2 18dollar/m2/day
3 3 100dollar/m2/month
I would like to split column price into two new columns: unit_price and price_unit as below:
id unit_price price_unit
0 0 15.0 dollar/m2/day
1 1 90.0 dollar/m2/month
2 2 18.0 dollar/m2/day
3 3 100.0 dollar/m2/month
This is my solution:
df['unit_price'] = df['price'].str.split('dollar').str[0].astype(float)
#df['unit_price'] = df['price'].str.extract('(\d*\.\d+|\d+)', expand=False).astype(float)
df['price_unit'] = df['price'].str.split('dollar').str[1]
del df['price']
For the column unit_price, it works fine, but for price_unit, when i split by dollar, I got result as below which doesn't including character dollar, or if I use df['price'].str.replace(r'\d', ''), all numbers were removed.
How could I do it correctly in Python? Thanks.
df['price_unit']
Out[474]:
0 /m2/day
1 /m2/month
2 /m2/day
3 /m2/month
Name: price_unit, dtype: object