1

I used a web crawler to get some data. I stored the data in a variable price. The type of price is:

<class 'bs4.element.NavigableString'>

The type of each element of price is:

<type 'unicode'>

Basically the price contains some white space and line feeds followed by: $520. I want to eliminate all the extra symbols and recover only the number 520. I already did a naive solution:

def reducePrice(price):
    key=0
    string=""
        for i in price:
            if (key==1):
                string=string+i
            if (i== '$'):
                key=1
    key=0
    return string

But I want to implement a more elegant solution, transforming the type of price into str and then using str methods to manipulate it. I already searched a lot in the web and other posts in the forum. The best I could get was that using:

p = "".join(price)

I can generate a big unicode variable. If you can give me a hint I would be grateful (I'm using python 2.7 in Ubuntu).

edit I add my spider just in case you need it:

def spider(max_pages):
        page = 1
        while page <= max_pages:
            url = "http://www.lider.cl/walmart/catalog/product/productDetails.jsp?cId=CF_Nivel2_000021&productId=PROD_5913&skuId=5913&pId=CF_Nivel1_000004&navAction=jump&navCount=12"
            source_code = requests.get(url)
            plain_text = source_code.text
            soup = BeautifulSoup(plain_text)
            title = ""
            price = ""
            for link in soup.findAll('span', {'itemprop': 'name'}):
                title = link.string
            for link in soup.find('em', {'class': 'oferLowPrice fixPriceOferUp  '}):
                price = link.string

            print(title + '='+ str(reducePrice(price)))
            page += 1

spider(1)

edit 2 Thanks to Martin and mASOUD I could generate the solution using str methods:

def reducePrice(price):
   return int((("".join(("".join(price)).split())).replace("$","")).encode())

This method return an int. This was not my original question but it was the next step in my project. I added it because we can't cast unicode into int but using encode() to generate a str first, we can.

3
  • Did you try looking for bs4 to string? You are close. Commented Jun 2, 2015 at 2:18
  • I tried str(price) and then price.strip() but it didn't work. Commented Jun 2, 2015 at 2:23
  • can you post your bs4 code? Did you try str(price[0])? Commented Jun 2, 2015 at 2:25

1 Answer 1

2

Use a RegEx to extract the price from your Unicode string:

import re

def reducePrice(price):
    match = re.search(r'\d+', u'  $500  ')
    price = match.group()  # returns u"500"
    price = str(price) # convert "500" in unicode to single-byte characters.
    return price

Even though this function converts Unicode to a "regular" string as you asked, is there any reason you want this? Unicode strings can be worked with the same way as a regular string. That is u"500" is almost the same as "500"

Sign up to request clarification or add additional context in comments.

9 Comments

it looks quite elegant, yet I wonder if I can transform a variable of type unicode into string.
Added some extra text at the end - is there any reason you want to convert to a string? Unicode is also a "string".
I know. But I can't use string's methods in a unicode "string" and I want to know if it is posible to do the conversion just to learn for the sake of knowledge.
strip definitely works on a unicode string. Simple example: u" 500 ".strip() which returns u"500"
If you really just need to convert a unicode string to a regular "python2" string you can encode() it. like: u'hi'.encode() which will give you 'hi'.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.