0

I want to remove the domain in an url For e.g. User entered www.google.com But I only need www.google

How to do this in python? Thanks

5
  • 3
    What if the user entered www.google.com.au or www.google.co.uk? Commented Jul 29, 2016 at 11:53
  • @Aryan, Please edit your question with more details as it seems too abroad, narrow down to your requirements! Commented Jul 29, 2016 at 11:58
  • I think it's worth adding that when you set out to manipulate urls, figuring out what the actual requirements are in all possible cases is usually much harder than writing the code. Commented Jul 29, 2016 at 12:01
  • @mhawke The question says remove domain name, so answer still would be www.google. In any case, to avoid complexity, we can look for dots starting from the left and stop at second one. Commented Jul 29, 2016 at 12:05
  • @Learner: yes, but the question title asks only how to remove .com. .com is the top level domain... but google.com can also be considered the domain, so that would leave just www, which I suppose is not what is intended. The question needs clarification. Commented Jul 29, 2016 at 13:26

4 Answers 4

3

This is a very general question. But the narrowest answer would be as follows (assuming url holds the URL in question):

if url.endswith(".com"):
    url = url[:-4]

If you want to remove the last period and everything to the right of it the code would be a little more complicated:

pos = url.rfind('.') # find rightmost dot
if pos >= 0:         # found one
    url = url[:pos]
Sign up to request clarification or add additional context in comments.

3 Comments

url.rsplit('.', 1)[0] will split on the rightmost dot and return the first item
@MosesKoledoye: or url.rpartition('.')[0].
As @SteveJessop points out, the eTLD (effective top-level domain) might be made up of multiple components. For example, .co.uk is the UK equivalent of the originally American and now global .com domain.
2

To solve this without having the problem of dealing with domain name, you can look for the dots from left hand side and stop at the second dot.

t = 'www.google.com'
a = t.split('.')[1]
pos = t.find(a)
t = t[:pos+len(a)]

>>> 'www.google'

2 Comments

Does not work with meta.codereview.stackexchange.com. No irony intended.
This fails on anything with more than one . in it.
0

If you want to remove 4 characters at the end, slice it

url = 'www.google.com'
cut_url = str[:-4]
# output : 'www.google'

More advanced answer

If you have a list of all the possible domains domains:

domains = ['com', 'uk', 'fr', 'net', 'co', 'nz']  # and so on...
while True:
    domain = url.split('.')[-1]
    if domain in domains:
        url = '.'.join(url.split('.')[:-1])
    else:
        break

Or if, for example, you have a domains list where .co and .uk are not separated:

domains = ['.com', '.co.uk', '.fr', '.net', '.co.nz']  # and so on...
for domain in domains:
    if url.endswith(domain):
        cut_url = url[:-len(domain)]
        break
else:  # there is no indentation mistake here.
       # else after for will be executed if for did not break
    print('no known domain found')

6 Comments

What about www.mysite.io or www.mysite.om? .. etc
Then maybe '.'join(mystr.split('.')[:-1]). But what about .co.uk? The problem is under-specified, since the questioner says "remove the domain", whereas .com and google.com and www.google.com are all domains of different sorts. A complete solution might require using the Mozilla Public Suffix List, depending on the actual problem.
Was going to edit to include these cases, but I first wanted to provide a simple answer for OP's question as it was asked
str is really bad name for variable
@SteveJessop would you rather add all the possible domain name manually, or try to use how the url is wirtten to get the domain name out of it ?
|
-1

What you need here is rstrip function.

Try this code:

url = 'www.google.com'
url2 = 'www.google'

new_url = url.rstrip('.com')
print (new_url)

new_url2 = url2.rstrip('.com')
print (new_url2)

rstrip will only strip if the string is present, in this case ".com". If not, it will just leave it. rstrip is for stripping 'right-most' matched string and lstrip is the opposite of this. Check these docs. Also check strip and lstrip functions.

UPDATE

As @SteveJessop pointed out that the above example is NOT the right solution so i'm submitting another solution, though it's related to another answer here, it does check first if the string ends with a '.com'.

url = 'www.foo.com'
if url.endswith('.com'):
    url = url[:-4]
    print (url)

3 Comments

Well, except that 'www.foo.com'.rstrip('.com') is www.f
In case readers are having trouble working out why this is so, the argument to rsrtrip specifies a set of characters, any and all of which will be removed from the right-hand end of the string.
@SteveJessop , thanks for pointing it out :) appreciate that! What do you think about the updated solution? even though it's related to another user's answer, i believe this should help the asker

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.