I used a web crawler to get some data. I stored the data in a variable price. The type of price is:
<class 'bs4.element.NavigableString'>
The type of each element of price is:
<type 'unicode'>
Basically the price contains some white space and line feeds followed by: $520. I want to eliminate all the extra symbols and recover only the number 520. I already did a naive solution:
def reducePrice(price):
key=0
string=""
for i in price:
if (key==1):
string=string+i
if (i== '$'):
key=1
key=0
return string
But I want to implement a more elegant solution, transforming the type of price into str and then using str methods to manipulate it. I already searched a lot in the web and other posts in the forum. The best I could get was that using:
p = "".join(price)
I can generate a big unicode variable. If you can give me a hint I would be grateful (I'm using python 2.7 in Ubuntu).
edit I add my spider just in case you need it:
def spider(max_pages):
page = 1
while page <= max_pages:
url = "http://www.lider.cl/walmart/catalog/product/productDetails.jsp?cId=CF_Nivel2_000021&productId=PROD_5913&skuId=5913&pId=CF_Nivel1_000004&navAction=jump&navCount=12"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
title = ""
price = ""
for link in soup.findAll('span', {'itemprop': 'name'}):
title = link.string
for link in soup.find('em', {'class': 'oferLowPrice fixPriceOferUp '}):
price = link.string
print(title + '='+ str(reducePrice(price)))
page += 1
spider(1)
edit 2 Thanks to Martin and mASOUD I could generate the solution using str methods:
def reducePrice(price):
return int((("".join(("".join(price)).split())).replace("$","")).encode())
This method return an int. This was not my original question but it was the next step in my project. I added it because we can't cast unicode into int but using encode() to generate a str first, we can.
str(price)and thenprice.strip()but it didn't work.