0

I am trying to convert the last 'price' item in my list to an actual float and not a string in my output. Is this possible?

OUTPUT

{'name': 'ADA Hi-Lo Power Plinth Table', 'product_ID': '55984', 'price': '$2,849.00'}
{'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs', 'product_ID': '31350', 'price': '$729.00'}
{'name': 'Adjustable Headrest Couch - Chrome-Plated Steel Legs (X-Large)', 'product_ID': '31351', 'price': '$769.00'}
{'name': 'Adjustable Headrest Couch - Hardwood Base (No Drawers)', 'product_ID': '65446', 'price': '$1,059.00'}      
{'name': 'Adjustable Headrest Couch - Hardwood Base 2 Drawers', 'product_ID': '65448', 'price': '$1,195.00'}
{'name': 'Adjustable Headrest Couch - Hardwood Tapered Legs', 'product_ID': '31355', 'price': '$735.00'}
{'name': 'Adjustable Headrest Couch - Hardwood Tapered Legs (X-Large)', 'product_ID': '31356', 'price': '$775.00'}
{'name': 'Angeles Rest Standard Cot Sheets - ABC Print', 'product_ID': 'A31125', 'price': '$11.19'}

START OF PYTHON SCRIPT

import requests
from bs4 import BeautifulSoup
import sys

with open('recoveryCouches','r') as html_file:
    content= html_file.read()
    soup = BeautifulSoup(content,'lxml')
    allProductDivs = soup.find('div', class_='product-items product-items-4')
    nameDiv = soup.find_all('div',class_='name')
    prodID = soup.find_all('span', id='product_id')
    prodCost = soup.find_all('span', class_='regular-price')

    records=[]
     
    for i in range(len(nameDiv)):
        records.append({
            "name": nameDiv[i].find('a').text.strip(),
            "product_ID": prodID[i].text.strip(),
            "price": prodCost[i].text.strip()
            })

    for x in records:
        print(x)
9
  • 1
    float(price[1:].replace(',', '')) Commented Aug 4, 2021 at 1:11
  • use regex [\d\.]+ to capture float number only Commented Aug 4, 2021 at 1:34
  • Probable duplicate of stackoverflow.com/questions/37580151/… Commented Aug 4, 2021 at 1:54
  • @Forest1 can you tell me where I need to add that section of code? Commented Aug 4, 2021 at 2:12
  • @deyizzle haven't I told you! Have you really checked my answer carefully and pulled out that accepted answer. Commented Aug 4, 2021 at 2:13

2 Answers 2

1

You can try this, since you can't convert both $ and , to float. You can replace both of them, and convert.

You may use re module to replace them at once :

import re

for i in range(len(nameDiv)):
    records.append({
        "name": nameDiv[i].find('a').text.strip(),
        "product_ID": prodID[i].text.strip(),
        "price": float(re.sub(r"[$,]","",prodCost[i].text.strip()))
            })

Or if all of the string have $ at first the you can follow @Forest comment,

float(price[1:].replace(',', ''))

Like this:

float(prodCost[i].text.strip()[1:].replace(",",""))
Sign up to request clarification or add additional context in comments.

6 Comments

I believe you are trying to say float rather then integer cuz you can't convert 11.19 into integer. :)
Using string manipulation to pre-strip a currency symbol is not a great approach when locale exists.
@Xitiz I tried both float(prodCost[i].text.strip()[1:].replace(",","")) and "price": float(re.sub(r"[$,]","",prodCost[i].text.strip()))
Are you getting any error? If you had tried exactly that then, that should work for you, can you provide complete code so that I can check if you're doing right or wrong?
Not getting any errors, my output is just still showing the $ and , in that price field.
|
0

Naive removal of the currency symbol prefix makes your code non-i18n-compatible and fragile. The general solution is a little complicated, but if you assume that the currency symbol remains a prefix and that's a Canadian dollar symbol, then:

from locale import setlocale, LC_ALL, localeconv, atof
from decimal import Decimal
import re

setlocale(LC_ALL, ('en_CA', 'UTF-8'))

# ...

price_str = re.sub(r'\s', '', prodCost[i].text)
stripped = price_str.removeprefix(localeconv()['currency_symbol'])
price = atof(stripped, Decimal)

Also note that Decimal is a better representation of a currency than a float for most purposes.

8 Comments

If we can do it in every easy way then, why should be do this? Have you downvoted my answer to answer this complicated things?
@Xitiz As in the answer: fragility. The easy way is correct until it isn't.
Are you sure, my answer is that bad to downvote? Just doing replace should work, but you are trying to over complicate the easy things.
@Xitiz Check out this interesting map and tell me what it means to you. To me, it means a million-dollar mistake if you do international business and accidentally mis-assign a hard-coded decimal separator, as you're at risk of doing in your answer.
locale would be nice for the decimal point ".", which is a comma instead, in certain locales. But currency symbols are seldom standardized in "real world" data - there are even competing standards - US Dollars could be prefixed with either "$", "USD", "US$" and so on. So what works is whatever will fit the specific data in the input set. Filtering for whitespace before and after the "$" sigin would be much less "fragile" than naively using the currency symbol given by the locale settings and expect that to work.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.