1

There is a html file saved on my harddrive, and I need to extract the strings displayed on the html page and save them into a text file using python.

html representation with tags, etc: 
Bme:&nbsp;1&nbsp;Port:&nbsp;1<br />
Downstream&nbsp;line&nbsp;rate:&nbsp;6736&nbsp;kbps<br />
Upstream&nbsp;line&nbsp;rate:&nbsp;964&nbsp;kbps<br />

What I need to extract from above is the number after the

Downstream&nbsp;line&nbsp;rate:&nbsp;

in this case, 6736, and write this number to a file. How can this be achieved?

1 Answer 1

2

BeautifulSoup is probably overkill for this. If all the "Downstream" lines are formatted like that, you can easily get those numbers with regular expressions.

>>> import re
>>> regex = r'Downstream&nbsp;line&nbsp;rate:&nbsp;(\d\d*)&nbsp;kbps<br />'
>>> re.search(regex, "Downstream&nbsp;line&nbsp;rate:&nbsp;6736&nbsp;kbps<br />").group(1)
'6736'

If all the lines aren't formatted exactly like that, you might have to make the regex more general. Possibly something like r'Downstream.*(\d\d*)'.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.