I want to remove the domain in an url For e.g. User entered www.google.com But I only need www.google
How to do this in python? Thanks
This is a very general question. But the narrowest answer would be as follows (assuming url holds the URL in question):
if url.endswith(".com"):
url = url[:-4]
If you want to remove the last period and everything to the right of it the code would be a little more complicated:
pos = url.rfind('.') # find rightmost dot
if pos >= 0: # found one
url = url[:pos]
url.rsplit('.', 1)[0] will split on the rightmost dot and return the first itemurl.rpartition('.')[0]..co.uk is the UK equivalent of the originally American and now global .com domain.To solve this without having the problem of dealing with domain name, you can look for the dots from left hand side and stop at the second dot.
t = 'www.google.com'
a = t.split('.')[1]
pos = t.find(a)
t = t[:pos+len(a)]
>>> 'www.google'
If you want to remove 4 characters at the end, slice it
url = 'www.google.com'
cut_url = str[:-4]
# output : 'www.google'
More advanced answer
If you have a list of all the possible domains domains:
domains = ['com', 'uk', 'fr', 'net', 'co', 'nz'] # and so on...
while True:
domain = url.split('.')[-1]
if domain in domains:
url = '.'.join(url.split('.')[:-1])
else:
break
Or if, for example, you have a domains list where .co and .uk are not separated:
domains = ['.com', '.co.uk', '.fr', '.net', '.co.nz'] # and so on...
for domain in domains:
if url.endswith(domain):
cut_url = url[:-len(domain)]
break
else: # there is no indentation mistake here.
# else after for will be executed if for did not break
print('no known domain found')
www.mysite.io or www.mysite.om? .. etc'.'join(mystr.split('.')[:-1]). But what about .co.uk? The problem is under-specified, since the questioner says "remove the domain", whereas .com and google.com and www.google.com are all domains of different sorts. A complete solution might require using the Mozilla Public Suffix List, depending on the actual problem.str is really bad name for variableWhat you need here is rstrip function.
Try this code:
url = 'www.google.com'
url2 = 'www.google'
new_url = url.rstrip('.com')
print (new_url)
new_url2 = url2.rstrip('.com')
print (new_url2)
rstrip will only strip if the string is present, in this case ".com". If not, it will just leave it. rstrip is for stripping 'right-most' matched string and lstrip is the opposite of this. Check these docs.
Also check strip and lstrip functions.
As @SteveJessop pointed out that the above example is NOT the right solution so i'm submitting another solution, though it's related to another answer here, it does check first if the string ends with a '.com'.
url = 'www.foo.com'
if url.endswith('.com'):
url = url[:-4]
print (url)
'www.foo.com'.rstrip('.com') is www.frsrtrip specifies a set of characters, any and all of which will be removed from the right-hand end of the string.
www.google.com.auorwww.google.co.uk?www.google. In any case, to avoid complexity, we can look for dots starting from the left and stop at second one..com..comis the top level domain... butgoogle.comcan also be considered the domain, so that would leave justwww, which I suppose is not what is intended. The question needs clarification.