0

I want to write a Python script which downloads every picture of an array which is full of links.

The code looks like this:

for url in array:
    if 'jpg' in url or 'jpeg' in url or 'png' in url or 'gif' in url:
        print url

As you can see, the if statement is pretty inefficient and I'd like to simplify it. Preferably with a regex if it's possible there. Can someone please help me?

1

5 Answers 5

9

Regex is not the right tool for this because you are not matching patterns, just looking for substrings.

Instead, you should use any and a generator expression:

if any(x in url for x in ('jpg', 'jpeg', 'png', 'gif')):

As a bonus, this solution is lazy like your current one (it only performas as many in membership tests as needed)

Sign up to request clarification or add additional context in comments.

Comments

2

Although this is not the same, it is more what you intend to do:

for url in array:
    if url.rsplit('.',1)[1] in ('jpg', 'jpeg', 'png', 'gif'):
        print url

Comments

2

You may not need regex to do this but if you still want to, here's a way:

http://regex101.com/r/jH8fO4/3 <-- see the regex in action.

^.*\.(jpeg|jpg|png|gif)$

you can of course add more to the end of the expression in order to handle cases where there's a request or variable attached to the url.

edit - updated to comply with the possibility of more than 1 dot in the filename:

http://regex101.com/r/jH8fO4/4 ^[a-z0-9]*\.{1}(jpeg|jpg|png|gif)$

1 Comment

i'll try to handle that case but that isn't a strictly invalid filename, so I don't know if it'd be a necessary check.
0

Doing the same thing using regular expressions would look something like this.

pattern = re.compile('jpg|jpeg|png|gif')

for url in array:
    if pattern.search(url) is not None:
        print url

2 Comments

This will match urls like mygif.com/index.html. Although the original problem will match those urls too, we only want to match the extensions if they're at the end of the url.
The OP asked if his code could be done using regexes. Of course you could improve it, but that's not what was asked for.
0

I would use os.path.splitext:

import os
for url in array:
    _, ext = os.path.splitext(url)
    if ext in ('.jpg', '.jpeg', '.png', '.gif'):
        print url

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.