How to remove .com from an url in python?

Question

I want to remove the domain in an url For e.g. User entered www.google.com But I only need www.google

How to do this in python? Thanks

What if the user entered www.google.com.au or www.google.co.uk? — mhawke
– mhawke, Commented Jul 29, 2016 at 11:53
@Aryan, Please edit your question with more details as it seems too abroad, narrow down to your requirements! — Iron Fist
– Iron Fist, Commented Jul 29, 2016 at 11:58
I think it's worth adding that when you set out to manipulate urls, figuring out what the actual requirements are in all possible cases is usually much harder than writing the code. — Steve Jessop
– Steve Jessop, Commented Jul 29, 2016 at 12:01
@mhawke The question says remove domain name, so answer still would be www.google. In any case, to avoid complexity, we can look for dots starting from the left and stop at second one. — Learner
– Learner, Commented Jul 29, 2016 at 12:05
@Learner: yes, but the question title asks only how to remove .com. .com is the top level domain... but google.com can also be considered the domain, so that would leave just www, which I suppose is not what is intended. The question needs clarification. — mhawke
– mhawke, Commented Jul 29, 2016 at 13:26

holdenweb · Accepted Answer · 2016-07-29 11:52:01Z

3

This is a very general question. But the narrowest answer would be as follows (assuming url holds the URL in question):

if url.endswith(".com"):
    url = url[:-4]

If you want to remove the last period and everything to the right of it the code would be a little more complicated:

pos = url.rfind('.') # find rightmost dot
if pos >= 0:         # found one
    url = url[:pos]

answered Jul 29, 2016 at 11:52

holdenweb

37.8k7 gold badges62 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Moses Koledoye Over a year ago

url.rsplit('.', 1)[0] will split on the rightmost dot and return the first item

Steve Jessop Over a year ago

@MosesKoledoye: or url.rpartition('.')[0].

holdenweb Over a year ago

As @SteveJessop points out, the eTLD (effective top-level domain) might be made up of multiple components. For example, .co.uk is the UK equivalent of the originally American and now global .com domain.

Learner · Accepted Answer · 2016-07-29 12:03:11Z

2

To solve this without having the problem of dealing with domain name, you can look for the dots from left hand side and stop at the second dot.

t = 'www.google.com'
a = t.split('.')[1]
pos = t.find(a)
t = t[:pos+len(a)]

>>> 'www.google'

answered Jul 29, 2016 at 12:03

Learner

6595 silver badges14 bronze badges

2 Comments

AdrienW Over a year ago

Does not work with meta.codereview.stackexchange.com. No irony intended.

Mast Over a year ago

This fails on anything with more than one . in it.

Graham · Accepted Answer · 2017-09-27 05:43:09Z

0

If you want to remove 4 characters at the end, slice it

url = 'www.google.com'
cut_url = str[:-4]
# output : 'www.google'

More advanced answer

If you have a list of all the possible domains domains:

domains = ['com', 'uk', 'fr', 'net', 'co', 'nz']  # and so on...
while True:
    domain = url.split('.')[-1]
    if domain in domains:
        url = '.'.join(url.split('.')[:-1])
    else:
        break

Or if, for example, you have a domains list where .co and .uk are not separated:

domains = ['.com', '.co.uk', '.fr', '.net', '.co.nz']  # and so on...
for domain in domains:
    if url.endswith(domain):
        cut_url = url[:-len(domain)]
        break
else:  # there is no indentation mistake here.
       # else after for will be executed if for did not break
    print('no known domain found')

edited Sep 27, 2017 at 5:43

Graham

7,86020 gold badges67 silver badges92 bronze badges

answered Jul 29, 2016 at 11:50

AdrienW

3,5527 gold badges36 silver badges66 bronze badges

6 Comments

Iron Fist Over a year ago

What about www.mysite.io or www.mysite.om? .. etc

Steve Jessop Over a year ago

Then maybe '.'join(mystr.split('.')[:-1]). But what about .co.uk? The problem is under-specified, since the questioner says "remove the domain", whereas .com and google.com and www.google.com are all domains of different sorts. A complete solution might require using the Mozilla Public Suffix List, depending on the actual problem.

AdrienW Over a year ago

Was going to edit to include these cases, but I first wanted to provide a simple answer for OP's question as it was asked

Compadre Over a year ago

str is really bad name for variable

HolyDanna Over a year ago

@SteveJessop would you rather add all the possible domain name manually, or try to use how the url is wirtten to get the domain name out of it ?

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

-1

What you need here is rstrip function.

Try this code:

url = 'www.google.com'
url2 = 'www.google'

new_url = url.rstrip('.com')
print (new_url)

new_url2 = url2.rstrip('.com')
print (new_url2)

rstrip will only strip if the string is present, in this case ".com". If not, it will just leave it. rstrip is for stripping 'right-most' matched string and lstrip is the opposite of this. Check these docs. Also check strip and lstrip functions.

UPDATE

As @SteveJessop pointed out that the above example is NOT the right solution so i'm submitting another solution, though it's related to another answer here, it does check first if the string ends with a '.com'.

url = 'www.foo.com'
if url.endswith('.com'):
    url = url[:-4]
    print (url)

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jul 29, 2016 at 12:16

i333

131 silver badge8 bronze badges

3 Comments

Steve Jessop Over a year ago

Well, except that 'www.foo.com'.rstrip('.com') is www.f

holdenweb Over a year ago

In case readers are having trouble working out why this is so, the argument to rsrtrip specifies a set of characters, any and all of which will be removed from the right-hand end of the string.

i333 Over a year ago

@SteveJessop , thanks for pointing it out :) appreciate that! What do you think about the updated solution? even though it's related to another user's answer, i believe this should help the asker

Collectives™ on Stack Overflow

How to remove .com from an url in python?

4 Answers 4

3 Comments

2 Comments

6 Comments

UPDATE

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

2 Comments

6 Comments

UPDATE

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related