0

How can I get the links if the tag is in this form?

<a href="/url?q=instagram.com/goinggourmet/… class="zBAuLc"><div class="BNeawe vvjwJb AP7Wnd">Going Gourmet Catering (@goinggourmet) - Instagram</div></h3><div class="BNeawe UPmit AP7Wnd">www.instagram.com › goinggourmet</div></a> 

I have tried the below code and it helped me get only URLs, but the URLs comes in this format.

/url?q=https://bespokecatering.sydney/&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQFnoECAEQAg&usg=AOvVaw076QI0_4Yw4hNZ6iXHQZL-

/url?q=https://www.facebook.com/bespokecatering.sydney/videos/lockdown-does-not-mean-unfulfilled-cravings-order-our-weekly-favorites-order-her/892336708293067/%3Fextid%3DSEO----&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQtwJ6BAgEEAE&usg=AOvVaw2YQI1Bqwip72axc-Nh2_6e

/url?q=https://www.instagram.com/bespoke_catering/%3Fhl%3Den&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQFnoECAoQAg&usg=AOvVaw1QUCWYmxfSLb6Jx20hyXIR

I need only URLs from Facebook and Instagram, without any additional wordings, What I mean is I want only real link, not the redirected link.

I need something like this from above links,

'https://www.facebook.com/bespokecatering.sydney' 'https://www.instagram.com/bespoke_catering'

div = soup.find_all('div',attrs={'class':'kCrYT'})
for w in div:
for link in w.select('a'):
    urls = link['href']
    print(urls)

Any help is much appreciated.

I tried the below code, but it returns empty results or different results

div = soup.find_all('div',attrs={'class':'kCrYT'})
for w in div:
   for link in w.select('a'):
     urls = link['href']
     print(urls)
     for url in urls:
        try:
            j=url.split('=')[1]
            k=  '/'.join(j.split('/')[0:4])
            #print(k) 
        except:
            k = '' 
0

1 Answer 1

1

You already have your <a> selected - Just loop over selection and print results via ['href']:

div = soup.find_all('div',attrs={'class':'kCrYT'})
    for w in div:
        for link in w.select('a'):
            print(link['href'])

If you improve your question and add additional information as requested, we can answer more detailed.

EDIT

Answering your additional question with a simple example (smth you should provide in your question)

import requests
from bs4 import BeautifulSoup
result = '''
<div class="kCrYT">
    <a href="/url?q=https://bespokecatering.sydney/&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQFnoECAEQAg&usg=AOvVaw076QI0_4Yw4hNZ6iXHQZL-"></a>
</div>
<div class="kCrYT">
    <a href="/url?q=https://www.facebook.com/bespokecatering.sydney/videos/lockdown-does-not-mean-unfulfilled-cravings-order-our-weekly-favorites-order-her/892336708293067/%3Fextid%3DSEO----&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQtwJ6BAgEEAE&usg=AOvVaw2YQI1Bqwip72axc-Nh2_6e"></a>
</div>
<div class="kCrYT">
    <a href="/url?q=https://www.instagram.com/bespoke_catering/%3Fhl%3Den&sa=U&ved=2ahUKEwjTv6ueseHyAhUHb30KHYTYABwQFnoECAoQAg&usg=AOvVaw1QUCWYmxfSLb6Jx20hyXIR"></a>
</div>
'''
soup = BeautifulSoup(result, 'lxml')

div = soup.find_all('div',attrs={'class':'kCrYT'})
for w in div:
    for link in w.select('a'):
        print(dict(x.split('=') for x in requests.utils.urlparse(link['href']).query.split('&'))['q'].split('%3F')[0])

Result:

https://bespokecatering.sydney/ https://www.facebook.com/bespokecatering.sydney/videos/lockdown-does-not-mean-unfulfilled-cravings-order-our-weekly-favorites-order-her/892336708293067/ https://www.instagram.com/bespoke_catering/

Sign up to request clarification or add additional context in comments.

3 Comments

thanks that worked. But I have URLs like '''/url?q=facebook.com/bespokecatering.sydney/videos/…''' which doesn't work as a URL, so how can trim all URLs to something like this? ''' facebook.com/bespokecatering.sydney''' , Help is appreciated :)
There are a lot of ways ;) - Please improve / edit your question(not comments) and post an url/code so that I can reproduce, would be great.
Edited my question now, new to the platform so working around it. Please see if you can reproduce any results. TIA.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.