How to use Python Regex to match url

Question

I have a string:

test_string="lots of other html tags ,'https://news.sky.net/upload_files/image/2022/202209_166293.png',and still 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'"

How can I get the whole 2 urls in the string,by using python Regex ?

I tried:

pattern = 'https://news.sky.net/upload_files/image'
result = re.findall(pattern, test_string)

I can get a list:

['https://news.sky.net/upload_files/image','https://news.sky.net/upload_files/image']

but not the whole url ,so I tried:

pattern = 'https://news.sky.net/upload_files/image...$png'
result = re.findall(pattern, test_string)

But received an empty list.

Nick · Accepted Answer · 2022-09-12 05:29:21Z

2

You could match a minimal number of characters after image up to a . and either png or jpg:

test_string = "lots of other html tags ,'https://news.sky.net/upload_files/image/2022/202209_166293.png',and still 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'"
pattern = r'https://news.sky.net/upload_files/image.*?\.(?:png|jpg)'
re.findall(pattern, test_string)

Output:

[
 'https://news.sky.net/upload_files/image/2022/202209_166293.png',
 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'
]

answered Sep 12, 2022 at 5:29

Nick

147k23 gold badges67 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim Biegeleisen · Accepted Answer · 2022-09-12 05:26:18Z

2

Assuming you would always expect the URLs to appear inside single quotes, we can use re.findall as follows:

I have a string:

test_string = "lots of other html tags ,'https://news.sky.net/upload_files/image/2022/202209_166293.png',and still 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'"
urls = re.findall(r"'(https?:\S+?)'", test_string)
print(urls)

This prints:

['https://news.sky.net/upload_files/image/2022/202209_166293.png',
 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg']

answered Sep 12, 2022 at 5:26

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

Comments

Nibras Shami · Accepted Answer · 2022-09-12 06:06:08Z

You could match any URL inside the string you have by using the following regex '(https?://\S+)'

by applying this to your code it would be something like this:

import re

string = "Some string here'https://news.sky.net/upload_files/image/2022/202209_166293.png' And here as well 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg' that's it tho"

res = re.findall(r"(http(s)?://\S+)", string)

print(res)

this will return a list of URLs got collected from the string:

[
    'https://news.sky.net/upload_files/image/2022/202209_166293.png', 
    'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'
]

Regex Explaination:

'(https?://\S+)'

https? - to check if the url is https or http
\S+ - any non-whitespace character one or more times

So this will get either https or http then after :// characters it will take any non-whitespace character one or more times

Hope you find this helpful.

Collectives™ on Stack Overflow

How to use Python Regex to match url

3 Answers 3

Comments

Comments

Regex Explaination:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Regex Explaination:

Comments

Your Answer

Sign up or log in

Post as a guest

Related