Let's say I have a link like this:
link = '<a href="some text">...</a>'
Is there any way I can retrieve the text from anchor href attribute so the result will be something like this:
hrefText = 'some text'
And thank you in advance
Although you could split or use a regular expression, for a more modular and powerful tool set, you could use
BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/
Sample code:
from bs4 import BeautifulSoup
link = '<a href="some text">...</a>'
soup = BeautifulSoup(link, "html.parser")
for anchor in soup.find_all('a', href=True):
print anchor['href']
Alternatively, for a single function, you can do this:
from bs4 import BeautifulSoup
def getHref( link ):
soup = BeautifulSoup(link, "html.parser")
return soup.find_all('a', href=True)[0]['href']
You can use bs4 and requests lib for this.
import requests
from bs4 import BeautifulSoup
url = 'https://examplesite.com/'
source = requests.get(url)
text = source.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {}):
href = '' + link.get('href')
title = link.string
print("hrefText = ", href)
Hope this helps :)