Strip a specific part from a url string in python

Question

Im passing through some urls and I'd like to strip a part of it which dynamically changes so I don't know it firsthand. An example url is:

https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2

And I'd like to strip the gid=lostchapter part without any of the rest.

How do I do that?

What do you mean by "strip", exactly? Remove it from the URL? Extract it? Do you want to reconstruct the URL without it present? — ddejohn
– ddejohn, Commented Dec 11, 2022 at 4:43
I want to extract it as a stand-alone string and use it elsewhere — haduki
– haduki, Commented Dec 11, 2022 at 4:43

ddejohn · Accepted Answer · 2022-12-11 04:46:59Z

1

You can use urllib to convert the query string into a Python dict and access the desired item:

In [1]: from urllib import parse

In [2]: s = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

In [3]: q = parse.parse_qs(parse.urlsplit(s).query)

In [4]: q
Out[4]:
{'pid': ['2'],
 'gid': ['lostchapter'],
 'lang': ['en_GB'],
 'practice': ['1'],
 'channel': ['desktop'],
 'demo': ['2']}

In [5]: q["gid"]
Out[5]: ['lostchapter']

answered Dec 11, 2022 at 4:46

ddejohn

9,0043 gold badges21 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

haduki Over a year ago

How can I get it as a simple string? I mean like lostchapter

ddejohn Over a year ago

q["gid"][0] will pull the value out of the list. The reason parse_qs function always turns query strings into Dict[str, List[str]] is because of how query strings can be formed (i.e., query string parameters can be used more than once).

Hansen Idden · Accepted Answer · 2022-12-11 04:38:25Z

1

Here is the simple way to strip them

urls = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

# Import the `urlparse` and `urlunparse` methods
from urllib.parse import urlparse, urlunparse

# Parse the URL
url = urlparse(urls)

# Convert the `urlparse` object back into a URL string
url = urlunparse(url)

# Strip the string
url = url.split("?")[1]
url = url.split("&")[1]
# Print the new URL
print(url) # Prints "gid=lostchapter"

answered Dec 11, 2022 at 4:38

Hansen Idden

314 bronze badges

1 Comment

ddejohn Over a year ago

This is a brittle solution. What if the desired query parameter is not first in the query string?

anmol_gorakshakar · Accepted Answer · 2022-12-11 05:01:13Z

1

Method 1: Using UrlParsers

from urllib.parse import urlparse
p = urlparse('https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2')
param: list[str] = [i for i in p.query.split('&') if i.startswith('gid=')]

Output: gid=lostchapter

Method 2: Using Regex

param: str = re.search(r'gid=.*&', 'https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2').group()[:-1]

you can change the regex pattern to appropriate pattern to match the expected outputs. currently it will extract any value.

answered Dec 11, 2022 at 5:01

anmol_gorakshakar

1467 bronze badges

Comments

Tim Biegeleisen · Accepted Answer · 2022-12-11 05:43:24Z

1

We can try doing a regex replacement:

url = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"
output = re.sub(r'(?<=[?&])gid=lostchapter&?', '', url)
print(output)  # https://...?pid=2&lang=en_GB&practice=1&channel=desktop&demo=2

For a more generic replacement, match on the following regex pattern:

(?<=[?&])gid=\w+&?

edited Dec 11, 2022 at 5:43

answered Dec 11, 2022 at 4:36

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

1 Comment

haduki Over a year ago

And how do I obtain the gid=lostchapter ? I want to strip it and make it a stand-alone string

MUSTANGBOSS8055 · Accepted Answer · 2022-12-11 22:38:04Z

1

Using string slicing (I'm assuming there will be an '&' after gid=lostchapter)

url = r'https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2'
start = url.find('gid')
end = start + url[url.find('gid'):].find('&')
url = url[start:] + url[:end-1]
print(url)

output

gid=lostchapter

What I'm trying to do here is:

find index of occurrence of "gid"
find the first "&" after "gid" is found
concatenate the parts of the url after"gid" and before "&"

edited Dec 11, 2022 at 22:38

answered Dec 11, 2022 at 4:51

MUSTANGBOSS8055

694 silver badges11 bronze badges

Collectives™ on Stack Overflow

Strip a specific part from a url string in python

5 Answers 5

2 Comments

1 Comment

Method 1: Using UrlParsers

Method 2: Using Regex

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

1 Comment

Method 1: Using UrlParsers

Method 2: Using Regex

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related