Finding string in HTML file?

Question

Python noob here. I'm trying to print lines that contain a substring in an HTML file with Python. I know that the string is in the file because when I ctrl+f the string I'm searching for in the html file I find it. However when I run my code it doesn't print the desired result. Could someone explain what I'm doing wrong?

import requests
import datetime


from BeautifulSoup import BeautifulSoup

now =datetime.datetime.now()

cmonth = now.month
cday = now.day
cyear = now.year
find = 'boxscores/201'


url = 'http://www.basketball-reference.com/boxscores/index.cgi?lid=header_dateoutput&month={0}&day=17&year={2}'.format(cmonth,cday,cyear)
response = requests.get(url)
html = response.content
print html

for line in html:
    if find in line:
        print line

Koren · Accepted Answer · 2016-03-19 10:10:01Z

2

As snakecharmerb said, by using

for line in html :

you iterate over the characters of html when it's a string, not the lines. But you can use

for line in html.split("\n") :

to iterate over the lines.

answered Mar 19, 2016 at 10:10

Koren

513 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ernest Over a year ago

Addition to both answers. response.content is in bytes. You have to use response.text to have string and do .split("\n")

snakecharmerb · Accepted Answer · 2016-03-19 07:57:39Z

1

In the requests package response.content is a string, so you should search like this:

if find in html:
    # do something

By iterating over response.content with

for line in html

you are iterating over the individual characters in the string, not lines.

answered Mar 19, 2016 at 7:57

snakecharmerb

57.1k13 gold badges136 silver badges200 bronze badges

Collectives™ on Stack Overflow

Finding string in HTML file?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related