1

Here are my codes

import requests
from bs4 import BeautifulSoup as bs

url = "https://www.xxxxxxxx.com/xxxxx.html"
webpage = requests.get(url)
page = bs(webpage.content, "html.parser")
images_List = []

for img in page.find("div", attrs={"class":"post-body entry-content"}).findAll("img"):
    images_List.append(img.get('src'))

From the above codes,I can get a list of images links

However,those images links have different patterns related to the images size

['https://www.xxxxxxx.com/xxxxx/s1800/xx.jpg',
'https://www.xxxxxxxx.com/xxxxx/s1600/xx.jpg',
'https://www.xxxxxxxx.com/xxxxx/s1200/xx.jpg',
'https://www.xxxxxxxx.com/xxxxx/w1800/xx.jpg',
'https://www.xxxxxxxx.com/xxxxx/w1600/xx.jpg',
'https://www.xxxxxxxx.com/xxxxx/w1200/xx.jpg']

And I have used the codes below to change to size of the images to 5000

images_List = [n.replace('1800', '5000') for n in images_List]
images_List = [n.replace('1600', '5000') for n in images_List]
images_List = [n.replace('1200', '5000') for n in images_List]

However,sometimes it has more patterns of numbers other than s1800,s1600,s1200,w1800,w1600,w1200

I would like to ask if there is a more effective way to modify those links? Other than copy and paste codes like

images_List = [n.replace('1800', '5000') for n in images_List]
images_List = [n.replace('1600', '5000') for n in images_List]
images_List = [n.replace('1200', '5000') for n in images_List]
images_List = [n.replace('1000', '5000') for n in images_List]
images_List = [n.replace('800', '5000') for n in images_List]
images_List = [n.replace('400', '5000') for n in images_List]

Thank you.

0

3 Answers 3

4

Rather than using a bunch of replace statements, you could use a regular expression with re.sub, which would look something like this:

import re

images = [
    'https://www.xxxxxxx.com/xxxxx/s1800/xx.jpg',
    'https://www.xxxxxxxx.com/xxxxx/s1600/xx.jpg',
    'https://www.xxxxxxxx.com/xxxxx/s1200/xx.jpg',
]
images = [re.sub("(?<=[sw])[0-9]{3,4}/", "5000/", image) for image in images]

This matches any substring that is preceded by "s" or "w" has 3 or 4 numbers and then a "/", and the substitutes if for "5000/".

Sign up to request clarification or add additional context in comments.

2 Comments

What if the links are s1800,s1600,w1800,w1600,which I have to keep the s or w the same,just to modify the number part? Thanks
I didn't realise you also had "w" cases. I'll make an edit to work for this case.
1

This regular expression uses positive lookahead and positive lookbehind to match the number.

import re

images_List = [
    "https://www.xxxxxxx.com/xxxxx/s1800/xx.jpg",
    "https://www.xxxxxxxx.com/xxxxx/s1600/xx.jpg",
    "https://www.xxxxxxxx.com/xxxxx/s1200/xx.jpg",
]
images_List = [re.sub(r"(?<=\/[a-z])\d+(?=\/)", "5000", n) for n in images_List]

3 Comments

What if the links are s1800,s1600,w1800,w1600,which I have to keep the s or w the same,just to modify the number part? Thanks
@MaxChung I've updated the answer. now it should work for all of the described patterns.
You should upvote and accept the answer if "it works"
0

You can simply use re.sub():

import re

images_List = ['https://www.xxxxxxx.com/xxxxx/s1800/xx.jpg',
               'https://www.xxxxxxxx.com/xxxxx/s1600/xx.jpg',
               'https://www.xxxxxxxx.com/xxxxx/s1200/xx.jpg']

images_List = [re.sub('\d+', '5000', n) for n in images_List]

print(images_List)

Output:

['https://www.xxxxxxx.com/xxxxx/s5000/xx.jpg',
 'https://www.xxxxxxxx.com/xxxxx/s5000/xx.jpg',
 'https://www.xxxxxxxx.com/xxxxx/s5000/xx.jpg']


re.sub() will look at the first argument passed into the brackets, which is the pattern we want to substitute,

then look at the second argument, which is the string we will use to substitute others,

and the last argument is the string we want to modify.

The pattern I have is: '\d+', which tells re to look for series of numbers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.