0

Im passing through some urls and I'd like to strip a part of it which dynamically changes so I don't know it firsthand. An example url is:

https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2

And I'd like to strip the gid=lostchapter part without any of the rest.

How do I do that?

2
  • What do you mean by "strip", exactly? Remove it from the URL? Extract it? Do you want to reconstruct the URL without it present? Commented Dec 11, 2022 at 4:43
  • I want to extract it as a stand-alone string and use it elsewhere Commented Dec 11, 2022 at 4:43

5 Answers 5

1

You can use urllib to convert the query string into a Python dict and access the desired item:

In [1]: from urllib import parse

In [2]: s = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

In [3]: q = parse.parse_qs(parse.urlsplit(s).query)

In [4]: q
Out[4]:
{'pid': ['2'],
 'gid': ['lostchapter'],
 'lang': ['en_GB'],
 'practice': ['1'],
 'channel': ['desktop'],
 'demo': ['2']}

In [5]: q["gid"]
Out[5]: ['lostchapter']
Sign up to request clarification or add additional context in comments.

2 Comments

How can I get it as a simple string? I mean like lostchapter
q["gid"][0] will pull the value out of the list. The reason parse_qs function always turns query strings into Dict[str, List[str]] is because of how query strings can be formed (i.e., query string parameters can be used more than once).
1

Here is the simple way to strip them

urls = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

# Import the `urlparse` and `urlunparse` methods
from urllib.parse import urlparse, urlunparse

# Parse the URL
url = urlparse(urls)

# Convert the `urlparse` object back into a URL string
url = urlunparse(url)

# Strip the string
url = url.split("?")[1]
url = url.split("&")[1]
# Print the new URL
print(url) # Prints "gid=lostchapter"

1 Comment

This is a brittle solution. What if the desired query parameter is not first in the query string?
1

Method 1: Using UrlParsers

from urllib.parse import urlparse
p = urlparse('https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2')
param: list[str] = [i for i in p.query.split('&') if i.startswith('gid=')]

Output: gid=lostchapter

Method 2: Using Regex

param: str = re.search(r'gid=.*&', 'https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2').group()[:-1]

you can change the regex pattern to appropriate pattern to match the expected outputs. currently it will extract any value.

Comments

1

We can try doing a regex replacement:

url = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"
output = re.sub(r'(?<=[?&])gid=lostchapter&?', '', url)
print(output)  # https://...?pid=2&lang=en_GB&practice=1&channel=desktop&demo=2

For a more generic replacement, match on the following regex pattern:

(?<=[?&])gid=\w+&?

1 Comment

And how do I obtain the gid=lostchapter ? I want to strip it and make it a stand-alone string
1

Using string slicing (I'm assuming there will be an '&' after gid=lostchapter)

url = r'https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2'
start = url.find('gid')
end = start + url[url.find('gid'):].find('&')
url = url[start:] + url[:end-1]
print(url)

output

gid=lostchapter

What I'm trying to do here is:

  • find index of occurrence of "gid"
  • find the first "&" after "gid" is found
  • concatenate the parts of the url after"gid" and before "&"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.