Python web-scraping error - TypeError: can't use a string pattern on a bytes-like object

Question

I want to build a web scraper. Currently, I'm learning Python. This is the very basics!

Python Code

import urllib.request
import re

htmlfile = urllib.request.urlopen("http://basketball.realgm.com/")

htmltext = htmlfile.read()
title = re.findall('<title>(.*)</title>', htmltext)

print (htmltext)

Error:

  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

Does this answer your question? TypeError: can't use a string pattern on a bytes-like object in re.findall() — Karl Knechtel
– Karl Knechtel, Commented Sep 17, 2022 at 0:51

timgeb · Accepted Answer · 2014-06-24 14:55:38Z

5

You have to decode your data. Since the website in question says

charset=iso-8859-1

use that. utf-8 won't work in this case.

htmltext = htmlfile.read().decode('iso-8859-1')

answered Jun 24, 2014 at 14:55

timgeb

79.2k20 gold badges129 silver badges150 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jtwa Over a year ago

This worked, but I'm still confused why we had to put a decode('iso-8859-1'). Are there sites that wouldn't require that addition?

timgeb Over a year ago

@Jtwa check the source code of the site you are trying to scrape for charset=.... For the site in your question, the charset is iso-8859-1. If none is given, your best bet would usually be utf-8.

falsetru · Accepted Answer · 2014-06-24 14:54:48Z

3

Use bytes literal as pattern:

title = re.findall(b'<title>(.*)</title>', htmltext)

or decode the retrieved data to string:

title = re.findall('<title>(.*)</title>', htmltext.decode('utf-8'))

(change utf-8 with appropriate encoding of the document)

answered Jun 24, 2014 at 14:54

falsetru

371k69 gold badges769 silver badges659 bronze badges

Collectives™ on Stack Overflow

Python web-scraping error - TypeError: can't use a string pattern on a bytes-like object

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related