I need some help writing a regex pattern which can find affiliated links from a webpage.
Example code:
import requests,re
from bs4 import BeautifulSoup
res = requests.get('https://www.example.com')
soup = BeautifulSoup(res.text,'lxml')
links = soup.find_all('a', href=True)
# example_of_affiliate_links = ['http://example.com/click/click?p=1&t=url&s=IDHERE&url=https://www.mywebsite.com/920&f=TXL&name=electronic/ps4/','https://example.net/click/camref:IDhere/destination:https://www.mywebsite.com/product/138/sony-ps4.html']
I want to collect all affiliated links for "mywebsite.com", using the following regex pattern, but it is not capturing any links.
pattern = re.compile(r'([http,https]://www.mywebsite.com\S[\.html,\.php,\&]$)')
Is there a better way to do this?