1

I am taking an example scenario for my question. If I have a list of URLs :

url_list=["https:www.example.com/pag31/go","https:www.example.com/pag12/go","https:www.example.com/pag0/go"]

I want to replace the substring in between ".com/" and "go"

For Eg. new url should look like

['https:www.example.com/home/go','https:www.example.com/home/go','https:www.example.com/home/go']

I have tried slicing and replacing based on index but couldn't get the required result for the whole list.

Any help is really appreciated. Thanks in advance.

2 Answers 2

2

You can use regex sub() and list comprehension to apply your logic to every element of your list.

import re

url_list=["https:www.google.com/pag31/go","https:www.facebook.com/pag12/go","http:www.bing.com/pag0/go"]

pattern = r'(?<=com\/).*(?=\/go)'

result = [re.sub(pattern, 'home', url) for url in url_list]

This will match against any string where a value is found between com/ and /go. This will also ensure that we capture any website, regardless of http(s).

Output:

['https:www.google.com/home/go', 'https:www.facebook.com/home/go', 'http:www.bing.com/home/go']

Regex Explanation

The pattern r'(?<=com\/).*(?=\/go)' looks for the following:

(?<=com\/): Positive lookbehind to check if com/ prefixes our lookup

.*: Matches anything an infinite amount of times

(?=\/go): Positive look ahead to check if /go directly occurs after .*

This enables us to match any string between the positive checks. You can find a more in-depth explanation on the pattern here

Sign up to request clarification or add additional context in comments.

Comments

1

You can try using regular expressions of python.

import re
re_url ="https:www.example.com/.*/go"
url = "https:www.example.com/home/go"
url_list_new= [re.sub(re_url,url,x) for x in url_list]
url_list_new

Output:

 ['https:www.example.com/home/go',
 'https:www.example.com/home/go',
 'https:www.example.com/home/go']

2 Comments

This will only work against the example the poster has listed. If any other URL other then example are in the list, or even a http url it will not match.
@PacketLoss yes right. I missed that thing, thanks for pointing out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.