How can I get the links if the tag is in this form?
<a href="/url?q=instagram.com/goinggourmet/… class="zBAuLc"><div class="BNeawe vvjwJb AP7Wnd">Going Gourmet Catering (@goinggourmet) - Instagram</div></h3><div class="BNeawe UPmit AP7Wnd">www.instagram.com › goinggourmet</div></a>
I have tried the below code and it helped me get only URLs, but the URLs comes in this format.
/url?q=https://bespokecatering.sydney/&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQFnoECAEQAg&usg=AOvVaw076QI0_4Yw4hNZ6iXHQZL-
/url?q=https://www.facebook.com/bespokecatering.sydney/videos/lockdown-does-not-mean-unfulfilled-cravings-order-our-weekly-favorites-order-her/892336708293067/%3Fextid%3DSEO----&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQtwJ6BAgEEAE&usg=AOvVaw2YQI1Bqwip72axc-Nh2_6e
/url?q=https://www.instagram.com/bespoke_catering/%3Fhl%3Den&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQFnoECAoQAg&usg=AOvVaw1QUCWYmxfSLb6Jx20hyXIR
I need only URLs from Facebook and Instagram, without any additional wordings, What I mean is I want only real link, not the redirected link.
I need something like this from above links,
'https://www.facebook.com/bespokecatering.sydney' 'https://www.instagram.com/bespoke_catering'
div = soup.find_all('div',attrs={'class':'kCrYT'})
for w in div:
for link in w.select('a'):
urls = link['href']
print(urls)
Any help is much appreciated.
I tried the below code, but it returns empty results or different results
div = soup.find_all('div',attrs={'class':'kCrYT'})
for w in div:
for link in w.select('a'):
urls = link['href']
print(urls)
for url in urls:
try:
j=url.split('=')[1]
k= '/'.join(j.split('/')[0:4])
#print(k)
except:
k = ''