3

I have below piece of html and need to extract only text from there between

<p>Current</p> and <p>Archive</p>

Html chunk looks like:

<p>Current</p>
<a href="some link to somewhere 1">File1</a>
<br>
<a href="some link to somewhere 2">File2</a>
<br>
<a href="some link to somewhere 3">File3</a>
<br>
<p>Archive</p>
<a href="Some another link to another file">Some another file</a>

so the desired output should looks like File1, File2, File3.

This is what I've tried so far

import re
m = re.compile('<p>Current</p>(.*?)<p>Archive</p>').search(text)

but doesn't work as expected.

Is there any simple solution how to extract text between specified chunks of html tags in python?

3

2 Answers 2

1

If you insist upon using regex you can use it in combination with list comp like so:

chunk="""<p>Current</p>
<a href="some link to somewhere 1">File1</a>
<br>
<a href="some link to somewhere 2">File2</a>
<br>
<a href="some link to somewhere 3">File3</a>
<br>
<p>Archive</p>
<a href="Some another link to another file">Some another file</a>"""

import re 

# find all things between > and < the shorter the better  
found = re.findall(r">(.+?)<",chunk) 

# only use the stuff after "Current" before "Archive"
found[:] = found[ found.index("Current")+1:found.index("Archive")]

print(found) # python 3 syntax, remove () for python2.7 

Output:

['File1', 'File2', 'File3']
Sign up to request clarification or add additional context in comments.

Comments

0
from bs4 import BeautifulSoup as bs


html_text = """
<p>Current</p>
<a href="some link to somewhere 1">File1</a>
<br>
<a href="some link to somewhere 2">File2</a>
<br>
<a href="some link to somewhere 3">File3</a>
<br>
<p>Archive</p>
<a href="Some another link to another file">Some another file</a>"""

a_tag = soup.find_all("a")

text = []
for i in a_tag:
   text.append(get_text())

print (text)

Output:

['File1', 'File2', 'File3', 'Some another file']

BeautifulSoup library will be very useful for parsing html files and getting text from them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.