Extracting part of a URL using RegEx

Question

I am trying to use RegEx to extract a particular part of some URLs that come in different variations. Here is the generic format:

http://www.blackpages.com/cityName-StateName/mip/part-I-want-to-extract/randomCharacters

sometimes that "mip" part doesn't exist and the URL looks like this:

http://www.blackpages.com/cityName-StateName/part-I-want-to-extract/randomCharacters

I started writing the following RE:

re.compile("blackpages\.com/.*")

the .* matches any character, Now, how do I stop when I encounter a "/" and extract everything that follows before the next "/" is encountered? This would give me the part I want to extract.

Rakesh, any more concerns? Please feel free to drop a line below my answer. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 25, 2017 at 6:40

Graham · Accepted Answer · 2017-09-24 14:56:55Z

1

You need to use a negated character class:

re.compile(r"blackpages\.com/([^/]*)")
                            ^^^^

The [^/]* will match 0+ chars other than /, as many as possible (greedily).

If you expect at least one char after /, use + quantifier (1 or more occurrences) instead of *.

See the regex demo

Python code:

import re
rx = r"blackpages\.com/([^/]*)"
ss = ["http://www.blackpages.com/cityName-StateName/mip/part-I-want-to-extract/randomCharacters",
"http://www.blackpages.com/cityName-StateName/part-I-want-to-extract/randomCharacters"]
for s in ss:
    m = re.search(rx, s)
    if m:
        print(m.group(1))

Output:

cityName-StateName
cityName-StateName

edited Sep 24, 2017 at 14:56

Graham

7,86020 gold badges67 silver badges92 bronze badges

answered Apr 24, 2017 at 22:28

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rahul Over a year ago

Shouldn't you be using capturing groups with that to extract only that part ?

Collectives™ on Stack Overflow

Extracting part of a URL using RegEx

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related