How to find url without using href in the below code

Question

import requests as rs
from bs4 import BeautifulSoup as bs
import re

site = 'https://www.iciciprulife.com/'
req = rs.get(site)
soup = bs(req.text, 'html.parser')
link=input("Enter which url you want http or https:")

if link == "http":
    for i in soup.find_all('a',attrs={'href': re.compile("^http://")}):
        print(i.get('href'))

In The above code I don't want to use 'href' or 'a' instead I want to search URL using regular expression in entire webpage

You should say why you don't want to use href? Using your own regex to parse html is generally considered a bad idea... — tomjn
– tomjn, Commented Jun 9, 2021 at 9:25

pullidea-dev · Accepted Answer · 2021-06-09 09:35:18Z

0

soup.text turns soup to string. This string contains non-ASCII characters, so you need to convert/remove them first.

Then, you can search the whole string with regex.

To remove non-ASCII characters from string:

How to remove nonAscii characters in python

answered Jun 9, 2021 at 9:35

pullidea-dev

1,7981 gold badge9 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yf879 · Accepted Answer · 2021-06-09 11:14:23Z

0

urls = re.findall(r'https?://[^\s<>"]+', req.text)

edited Jun 9, 2021 at 11:14

answered Jun 9, 2021 at 11:04

yf879

1681 silver badge9 bronze badges

Collectives™ on Stack Overflow

How to find url without using href in the below code

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related