Capturing data from a web page using PYTHON

Question

I want to capture texts from the below link and save it. http://forecast.weather.gov/product.php?site=NWS&issuedby=FWD&product=RR5&format=CI&version=44&glossary=0

I need to save only the texts after .A, so I do not need the other texts in the page. Moreover, there are 50 different links at top of the page that I want to get all of the data from all of them.

I have written the below code but it returns nothing, how can specifically get part that I need?

import urllib
import re
htmlfile=urllib.urlopen("http://forecast.weather.gov/product.php?site=NWS&issuedby=FWD&product=RR5&format=CI&version=1&glossary=0")
htmltext=htmlfile.read()
regex='<pre class="glossaryProduct">(.+?)</pre>'
pattern=re.compile(regex)
out=re.findall(pattern, htmltext)
print (out)

I also used the following that returns all the content of the page:

import urllib
file1 = urllib.urlopen('http://forecast.weather.gov/product.php?site=NWS&issuedby=FWD&product=RR5&format=txt&version=1&glossary=0')
s1 = file1.read()
print(s1)

Can you help me to do so?

Heed one of the commandments of modern programming: Do not regex x/html content — Parfait
– Parfait, Commented Feb 27, 2017 at 19:07

Andrei T · Accepted Answer · 2017-02-27 17:02:35Z

1

Your regex is not capturing anything because your content starts with a newline, and you did not enable your . to include newlines. If you change your compile line to

pattern=re.compile(regex,re.S)

It should work.

Also you may want to look at:

https://regex101.com

It shows you EXACTLY what your regex is doing. When i put the S flag on the right side, it started working exactly as it should:

Image of regex working with the S flag

answered Feb 27, 2017 at 17:02

Andrei T

1731 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Behi Over a year ago

Thank you. I will check it.

Collectives™ on Stack Overflow

Capturing data from a web page using PYTHON

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related