2

I'm a Python beginner and I am having trouble scraping a webpage and displaying specific text from the page.

I know my problem lies within the encoding as I have been reading unicode type and have seen other newbies having the exact same issue.

For example lets say I wanted to scrape www.amazon.com this is the code I have

import pycurl
import cStringIO
from bs4 import BeautifulSoup

buf = cStringIO.StringIO()

curl = pycurl.Curl()
curl.setopt(curl.URL, 'http://www.amazon.com')
curl.setopt(curl.WRITEFUNCTION, buf.write)
curl.perform()

result = buf.getvalue()
result = unicode(result, "ascii", errors="ignore")
buf.close()

soup = BeautifulSoup(result)
print soup.get_text()

This returns the amazon web page to the result variable. But I get the annoying error when trying to use the beautifulsoup get_text() method:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 25790: ordinal not in range(128)

How do I ensure / decode the entire results of the contents returned within my curl request.

1
  • 1
    How this is python 3 and you have print as keyword? Commented Feb 13, 2014 at 22:02

1 Answer 1

4

You might want to use requests instead, its simpler and cleaner and AFAIK avoids the encoding issue.

from bs4 import BeautifulSoup
import requests

resp = requests.get('http://www.amazon.com')

bsoup = BeautifulSoup(resp.text)
print(bsoup.get_text())

There are reasons to use CURL, but requests is simpler and easier in most cases and your situation doesn't look like an exception based on what you describe.

EDIT: to resolve the unicode error, try explicitly encoding the string as utf-8 (as per this SO question):

encoded = resp.text.encode('utf-8')
bsoup = BeautifulSoup(encoded)
Sign up to request clarification or add additional context in comments.

1 Comment

Unfortunately I still get the encoding error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 25921: ordinal not in range(128)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.