0

I have a list of strings I scraped off the internet and I'm looking to extract their 'href':

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>

For example, I'm looking to loop through the list and dynamically extract

/red-wine

from

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>

Thanks!

1

2 Answers 2

1

You can also get the required text using Beautiful Soup:

from bs4 import *
data = '\
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>'
soup = BeautifulSoup(data, "html.parser")

lis = soup.findAll('a')
for li in lis:
    print(li['href'])
/red-wine
/white-wine
/rose-wine
/fine-wine
Sign up to request clarification or add additional context in comments.

Comments

1

You can use lxml for this. Something like this:

from lxml import html
import request

response = request.get('<your url>')
tree = html.fromstring(response.text)
href = tree.xpath('//a[@class="subnav__item"]/@href')

This should get you all the href in from the class "subnav__item"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.