I have currently a python code that should grab a link from google.
However, google uses a somewhat different method to linking. how could I grab the data-href instead of just href.
this is the html example of a google link, the code is different when I use firefox.. there is no data-href:
<a
href="/url? sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CC0QFjAB&url=http%3A%2F%2Fwww.dongemondcollege.nl%2F&ei=ihIwVdTqKtDYaoSjgMAP&usg=AFQjCNEvpxj60GxhQekQ2qI6QXDP2Vso1g&sig2=DuKoiCbIcI0ncx8D4gnSaA&bvm=bv.91071109,d.bGQ"
onmousedown="return rwt(this,'','','','2','AFQjCNEvpxj60GxhQekQ2qI6QXDP2Vso1g','DuKoiCbIcI0ncx8D4gnSaA','0CC0QFjAB','','',event)"
data-href="http://www.dongemondcollege.nl/">
Dongemond College > Algemeen > Home
</a>
Below is the Python code that should grab the link
Any suggestions?
def getLinks(source):
websiteLinks = []
for link in source.find_all('a'):
url = link.get('href')
if url:
if '/search?' not in url:
websiteLinks.append(url)
return websiteLinks