Extract HTML and search in Python

Question

Hi I am still a beginner at python and I was experimenting.

I am looking for a way to request a url and get the data of the webpage so the page does not need to open.

Once I get the data, I need to search the data for a tag, for example, if it has 'hello' somewhere on the home page that is requested.

Here is an example:

import urllib.request
fp = urllib.request.urlopen("http://www.python.org")
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

x = mystr.find('testing word tag');

print(x)

Please bear with me as I am still a rookie and can't find an example of what I am looking for.

^ found this code on here but it does not seem to work to find a string.

Anyone knows the best way to do it?

Thank you guys :)

Web-scrapping de-facto use BeautifulSoup

sushanth
– sushanth

2020-08-25 11:24:46 +00:00
Commented Aug 25, 2020 at 11:24 — sushanth
– sushanth, Commented Aug 25, 2020 at 11:24

Bastien Harkins · Accepted Answer · 2020-08-25 11:29:12Z

1

Here are the most used libraries for this kind of work:

Requests to get the HTML of the page.

BeautifulSoup to find elements (and much more)

$ pip install requests bs4

And in your favorite IDE:

import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.python.org")
soup = BeautifulSoup(r.content, "html.parser")

sometag = soup.find("sometag")
print(sometag)

answered Aug 25, 2020 at 11:29

Bastien Harkins

3051 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

colidyre Over a year ago

Or simply as an extension for OP's existing script: soup = BeautifulSoup(mystr); soup.find("sometag")

sushanth · Accepted Answer · 2020-08-25 11:27:12Z

0

Try this.

import requests
url = "https://stackoverflow.com/questions/63577634/extract-html-and-search-in-python"

res = requests.get(url)
print(res.text)

edited Aug 25, 2020 at 11:27

sushanth

8,2923 gold badges20 silver badges31 bronze badges

answered Aug 25, 2020 at 11:26

George

646 bronze badges

2 Comments

sushanth Over a year ago

How does this answer the question ?

George Over a year ago

You get the html of that webpage. If you want to extract tags easier you could use BeautifulSoup.

dabingsou · Accepted Answer · 2020-08-26 04:38:22Z

0

Another method.

from simplified_scrapy import SimplifiedDoc,req
html = req.get('https://www.python.org')
doc = SimplifiedDoc(html)
title = doc.getElement('title').text
print (title)
title = doc.getElementByText('Welcome to', tag='title').text
print (title)

Result:

Welcome to Python.org
Welcome to Python.org

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

answered Aug 26, 2020 at 4:38

dabingsou

2,4691 gold badge7 silver badges8 bronze badges

Collectives™ on Stack Overflow

Extract HTML and search in Python

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related