-1

I'm trying to combine if else inside my regular expression, basically if some patterns exists in the string, capture one pattern, if not, capture another.

The string is: 'https://www.searchpage.com/searchcompany.aspx?companyId=41490234&page=0&leftlink=true" and I want to extract staff around the '?"

So if '?' is detected inside the string, the regular expression should capture everything after the '?' mark; if not, then just capture from the beginning.

I used:'(.*\?.*)?(\?.*&.*)|(^&.*)' But it didn't work...

Any suggestion?

Thanks!

4
  • If you can guarantee that there won't be any other question marks later, you could use something like r".*?\??([^?]+)". Commented Feb 19, 2015 at 22:18
  • thanks for reply. But this still captures the 'search..' part. But I actually want to capture it happens when there's no question mark detected.. Commented Feb 19, 2015 at 22:20
  • 3
    Why not use urlparse? It allows you to get all the parts of the URL. Commented Feb 19, 2015 at 22:21
  • possible duplicate of Best way to parse a URL query string Commented Feb 20, 2015 at 9:40

3 Answers 3

5

Use urlparse:

>>> import urlparse
>>> parse_result = urlparse.urlparse('https://www.searchpage.com/searchcompany.aspx?
companyId=41490234&page=0&leftlink=true')

>>> parse_result
ParseResult(scheme='https', netloc='www.searchpage.com', 
path='/searchcompany.aspx', params='', 
query='companyId=41490234&page=0&leftlink=true', fragment='')

>>> urlparse.parse_qs(parse_result.query)
{'leftlink': ['true'], 'page': ['0'], 'companyId': ['41490234']}

The last line is a dictionary of key/value pairs.

Sign up to request clarification or add additional context in comments.

Comments

4

regex might not be the best solution to this problem ...why not just

my_url.split("?",1)

if that is truly all you wish to do

or as others have suggested

from urlparse import urlparse
print urlparse(my_url)

1 Comment

cause I want to parse and extract parts for not only url but also the query and the path. so there's url string as above, but also path string as '/company/Analytics/GetService' and also the query string as 'companyId=4343&type=0&page=11'
2

This regex:

(^[^?]*$|(?<=\?).*)

captures:

  • ^[^?]*$ everything, if there's no ?, or
  • (?<=\?).* everything after the ?, if there is one

However, you should look into urllib.parse (Python 3) or urlparse (Python 2) if you're working with URLs.

1 Comment

yes some famous saying about regular expressions comes to mind here (+1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.