Ok how do i use regex to remove http AND/OR www just to get http://www.domain.com/ into domain.com
Assume x as any kind of TLD or cTLD
Input example:
www.domain.x
Output:
domain.x
Ok how do i use regex to remove http AND/OR www just to get http://www.domain.com/ into domain.com
Assume x as any kind of TLD or cTLD
Input example:
www.domain.x
Output:
domain.x
Don't use regex, use urlparse to get netloc
>>> x = 'http://www.domain.com/'
>>> from urlparse import urlparse
>>> o = urlparse(x)
>>> o
ParseResult(scheme='http', netloc='www.domain.com', path='/', params='', query='', fragment='')
>>>
and then
>>> o.netloc
'www.domain.com'
>>> if o.netloc.startswith('www.'): print o.netloc[4:]
...
domain.com
>>>
o.netloc.startswith('www.') would be more appropriate than 'www' in o.netlocIf you really want to use regular expressions instead of urlparse() or splitting the string:
>>> domain = 'http://www.example.com/'
>>> re.match(r'(?:\w*://)?(?:.*\.)?([a-zA-Z-1-9]*\.[a-zA-Z]{1,}).*', domain).groups()[0]
example.com
The regular expression might a bit simplistic, but works. It's also not replacing, but I think getting the domain out is easier.
To support domains like 'co.uk', one can do the following:
>>> p = re.compile(r'(?:\w*://)?(?:.*?\.)?(?:([a-zA-Z-1-9]*)\.)?([a-zA-Z-1-9]*\.[a-zA-Z]{1,}).*')
>>> p.match(domain).groups()
('google', 'co.uk')
So you got to check the result for domains like 'co.uk', and join the result again in such a case. Normal domains should work OK. I could not make it work when you have multiple subdomains.
One-liner without regular expressions or fancy modules:
>>> domain = 'http://www.example.com/'
>>> '.'.join(domain.replace('http://','').split('/')[0].split('.')[-2:])
Here is one of the way to do it:
>>>import re
>>>str1 = 'http://www.domain.x/'
>>>p1 = re.compile('http://www.|/')
>>>out = p1.sub('',str1)
>>> ' spacious '.lstrip()'spacious '>>> 'www.example.com'.lstrip('cmowz.')'example.com'