0

I'm new to Python and stack overflow.

I'm trying to follow a tutorial on youtube (outdated I'm guessing based on the error I get) regarding fetching stock prices.

Here is the following program:

import urllib.request
import re


html = urllib.request.urlopen('http://finance.yahoo.com/q?uhb=uh3_finance_vert_gs_ctrl2&fr=&type=2button&s=AAPL')

htmltext = html.read()

regex = '<span id="yfs_l84_aapl">.+?</span>'

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print(price)

Since this is Python 3, I had to research on urllib.request and use those methods instead of a simple urllib.urlopen.

Anyways, when I run it, I get the following error:

Traceback (most recent call last):
  File "/Users/Harshil/Desktop/stockFetch.py", line 13, in <module>
    price = re.findall(pattern, htmltext)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/re.py", line 206, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

I realize the error and attempted to fix it by adding the following:

  codec = html.info().get_param('charset', 'utf8')
htmltext = html.decode(codec)

But it gives me another error:

Traceback (most recent call last):
  File "/Users/Harshil/Desktop/stockFetch.py", line 9, in <module>
    htmltext = html.decode(codec)
AttributeError: 'HTTPResponse' object has no attribute 'decode'

Hence, after spending reasonable amount of time, I don't know what to do. All I want to do is get the price for AAPL so I can further continue to build a general program to fetch prices for an array of stocks and use the prices in future programs.

Any help is appreciated. Thanks!

1 Answer 1

1

You are barking up the right tree. Try decoding the actual HTML byte string rather than the urlopen HTTPResponse:

htmltext = html.read()
codec = html.info().get_param('charset', 'utf8')
htmltext = htmltext.decode(codec)
price = re.findall(pattern, htmltext)
Sign up to request clarification or add additional context in comments.

2 Comments

If the page is known to be 'utf-8' encoded, `htmltext = htmlbytes.decode(encoding='utf-8'). Or, if your pattern is limited to ascii, prefix it with 'b'.
It works! I see what I did wrong. Thanks a ton! The only problem is my result outputs the whole sentence '<span ... </span>' so I should be able to fix that but thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.