If you are searching a list of URLs
urls = [ 'http://some.link.com/path/to/file.jpg',
'http://some.link.com/path/to/another.png',
'http://and.another.place.com/path/to/not-image.txt',
]
to find ones that match a given pattern you can use:
import re
for url in urls:
if re.match(r'http://.*(jpg|png|gif)$'):
print url
which will output
http://some.link.com/path/to/file.jpg
http://some.link.com/path/to/another.png
re.match() will test for a match at the start of the string and return a match object for the first two links, and None for the third.
If you are getting just the extension, you can use the following:
for url in urls:
m = re.match(r'http://.*(jpg|png|gif)$')
print m.group(0)
which will print
('jpg',)
('png',)
You will get just the extensions because that's what was defined as a group.
If you need to find the url in a long string of text (such as returned from wget), you need to use re.search() and enclose the part you are interested in with ( )'s. For example,
response = """dlkjkd dkjfadlfjkd fkdfl kadfjlkadfald ljkdskdfkl adfdf
kjakldjflkhttp://some.url.com/path/to/file.jpgkaksdj fkdjakjflakdjfad;kadj af
kdlfjd dkkf aldfkaklfakldfkja df"""
reg = re.search(r'(http:.*/(.*\.(jpg|png|gif)))', response)
print reg.groups()
will print
('http://some.url.com/path/to/file.jpg', 'file.jpg', 'jpg',)
or you can use re.findall or re.finditer in place of re.search to get all of the URL's in the long response. Search will only return the first one.