Error for Python 3.4.1 regarding string pattern and bytes-like object

Question

I'm new to Python and stack overflow.

I'm trying to follow a tutorial on youtube (outdated I'm guessing based on the error I get) regarding fetching stock prices.

Here is the following program:

import urllib.request
import re


html = urllib.request.urlopen('http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl2&fr=&type=2button&s=AAPL')

htmltext = html.read()

regex = '<span id="yfs_l84_aapl">.+?</span>'

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print(price)

Since this is Python 3, I had to research on urllib.request and use those methods instead of a simple urllib.urlopen.

Anyways, when I run it, I get the following error:

Traceback (most recent call last):
  File "/Users/Harshil/Desktop/stockFetch.py", line 13, in <module>
    price = re.findall(pattern, htmltext)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/re.py", line 206, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

I realize the error and attempted to fix it by adding the following:

  codec = html.info().get_param('charset', 'utf8')
htmltext = html.decode(codec)

But it gives me another error:

Traceback (most recent call last):
  File "/Users/Harshil/Desktop/stockFetch.py", line 9, in <module>
    htmltext = html.decode(codec)
AttributeError: 'HTTPResponse' object has no attribute 'decode'

Hence, after spending reasonable amount of time, I don't know what to do. All I want to do is get the price for AAPL so I can further continue to build a general program to fetch prices for an array of stocks and use the prices in future programs.

Any help is appreciated. Thanks!

mhawke · Accepted Answer · 2014-11-28 06:11:36Z

1

You are barking up the right tree. Try decoding the actual HTML byte string rather than the urlopen HTTPResponse:

htmltext = html.read()
codec = html.info().get_param('charset', 'utf8')
htmltext = htmltext.decode(codec)
price = re.findall(pattern, htmltext)

answered Nov 28, 2014 at 6:11

mhawke

87.5k10 gold badges122 silver badges142 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Terry Jan Reedy Over a year ago

If the page is known to be 'utf-8' encoded, `htmltext = htmlbytes.decode(encoding='utf-8'). Or, if your pattern is limited to ascii, prefix it with 'b'.

Harshil Over a year ago

It works! I see what I did wrong. Thanks a ton! The only problem is my result outputs the whole sentence '<span ... </span>' so I should be able to fix that but thanks again!

Collectives™ on Stack Overflow

Error for Python 3.4.1 regarding string pattern and bytes-like object

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related