split URL python [duplicate]

Question

I have a URL https://muk05119.us-east-1.snowflakecomputing.com and I want to retrieve only muk05119.us-east-1 from this.

Instead of splitting the string and retrieving the above, what is the best way to accomplish this?

Your example is clear by itself, but it's unclear what rule underlies it. Do you want the first two parts of the domain? All but the last two parts of the domain? Do you want everything before a main domain name and the top level domain (e.g. before .google.com but also before .australia.gov.au)? Or some other rule still? — Grismar
– Grismar, Commented Aug 3, 2022 at 6:37
the best way is to splitting the string with my_url_string[8:26]. If you want a more dynamic way to extracts sub urls, that's another story — NicoCaldo
– NicoCaldo, Commented Aug 3, 2022 at 6:40
It's the snowflake login URL, so going to remain always same as above. That is https://username.aws_region.snowflakecomputing.com. So I want to get username.aws_resgion only. (The length of username can differ here) — Mukul Kumar
– Mukul Kumar, Commented Aug 3, 2022 at 6:40

Grismar · Accepted Answer · 2022-08-03 06:44:06Z

1

Your example is clear by itself, but it's unclear what rule underlies it. Do you want the first two parts of the domain? All but the last two parts of the domain? Do you want everything before a main domain name and the top level domain (e.g. before .google.com but also before .australia.gov.au)? Or some other rule still?

The first two parts:

from urllib.parse import urlparse

url = 'https://muk05119.us-east-1.snowflakecomputing.com'
netloc = urlparse(url).netloc

print(netloc[:netloc.index('.', netloc.index('.')+1)])

Or:

print('.'.join(netloc.split('.')[:2]))

All but the last two parts:

print('.'.join(netloc.split('.')[:-2]))

For everything before the main and top-level domain, have a look at https://pypi.org/project/publicsuffixlist/ and use that with some of the above.

answered Aug 3, 2022 at 6:44

Grismar

32.4k6 gold badges42 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

pu2x · Accepted Answer · 2022-08-03 06:44:13Z

1

You can use builtin library to extract hostname by using urllib.parse.

But you have to split string to extract subdomain after all.

from urllib.parse import urlparse

URL = "https://muk05119.us-east-1.snowflakecomputing.com"
parsed = urlparse(URL)

host = parsed.netloc  # => muk05119.us-east-1.snowflakecomputing.com
subdomain = '.'.join(host.split('.')[:2])

answered Aug 3, 2022 at 6:44

pu2x

1115 bronze badges

Comments

NicoCaldo · Accepted Answer · 2022-08-03 09:12:12Z

0

You can use urlparse

from urllib.parse import urlparse
url = urlparse('https://muk05119.us-east-1.snowflakecomputing.com')
subdomain = url.hostname.split('.')[0] + '.' + url.hostname.split('.')[1]

where url.hostname.split('.')[x] where x indicates the subdomain. in your case, the first two subdomains need to be used, so 0 and 1

Documentation

edited Aug 3, 2022 at 9:12

answered Aug 3, 2022 at 6:45

NicoCaldo

1,6991 gold badge22 silver badges42 bronze badges

Collectives™ on Stack Overflow

split URL python [duplicate]

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related