Python Regex to Extract Domain from Text

Question

I have the following regex:

r'(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}'

When I apply this to a text string with, let's say, "this is www.website1.com and this is website2.com", I get:

['www.website1.com']

['website.com']

How can i modify the regex to exclude the 'www', so that I get 'website1.com' and 'website2.com? I'm missing something pretty basic ...

Possible duplicate of Extract all domains from text

tripleee
– tripleee

2018-11-06 07:32:11 +00:00
Commented Nov 6, 2018 at 7:32 — tripleee
– tripleee, Commented Nov 6, 2018 at 7:32

user3483203 · Accepted Answer · 2018-03-08 06:30:13Z

Try this one (thanks @SunDeep for the update):

\s(?:www.)?(\w+.com)

Explanation

\s matches any whitespace character

(?:www.)? non-capturing group, matches www. 0 or more times

(\w+.com) matches any word character one or more times, followed by .com

And in action:

import re

s = 'this is www.website1.com and this is website2.com'

matches = re.findall(r'\s(?:www.)?(\w+.com)', s)
print(matches)

Output:

['website1.com', 'website2.com']

A couple notes about this. First of all, matching all valid domain names is very difficult to do, so while I chose to use \w+ to capture for this example, I could have chosen something like: [a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}.

This answer has a lot of helpful info about matching domains: What is a regular expression which will match a valid domain name without a subdomain?

Next, I only look for .com domains, you could adjust my regular expression to something like:

\s(?:www.)?(\w+.(com|org|net))

To match whichever types of domains you were looking for.

Vikas Periyadath · Accepted Answer · 2018-03-08 10:03:21Z

0

Here a try :

import re
s = "www.website1.com"
k = re.findall ( '(www.)?(.*?)$', s, re.DOTALL)[0][1]
print(k)

O/P like :

'website1.com'

if it is s = "website1.com" also it will o/p like :

'website1.com'

edited Mar 8, 2018 at 10:03

answered Mar 8, 2018 at 6:19

Vikas Periyadath

3,1961 gold badge25 silver badges35 bronze badges

Collectives™ on Stack Overflow

Python Regex to Extract Domain from Text

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related