0

Please help with my regex problem

Here is my string

source="http://www.amazon.com/ref=s9_hps_bw_g200_t2?pf_rd_m=ATVPDKIKX0DER&pf_rd_i=3421"
source_resource="pf_rd_m=ATVPDKIKX0DER"

The source_resource is in the source may end with & or with .[for example].

So far,

regex = re.compile("pf_rd_m=ATVPDKIKX0DER+[&.]")
regex.findall(source)
[u'pf_rd_m=ATVPDKIKX0DER&']

I have used the text here. Rather using text, how can i use source_resource variable with & or . to find this out.

4
  • I think i found out. Let me wait for other experts to give their idea pattern=re.compile( source_resource+'[&.]') Commented Jun 12, 2013 at 7:55
  • 1
    Yes that is (nearly) correct - just escape the . with a \ , because in regular-expressions a normal dot will match on any character, not just a dot. Commented Jun 12, 2013 at 8:02
  • @Sathy Does source_resource stay the same throughout the program? Commented Jun 12, 2013 at 8:08
  • @Jared it goes under a if-elif-else module which creates different source_resource against each condition satisfied Commented Jun 12, 2013 at 8:55

3 Answers 3

3

If the goal is to extract the pf_rd_m value (which it apparently is as you are using regex.findall), than I'm not sure regex are the easiest solution here:

>>> import urlparse
>>> qs = urlparse.urlparse(source).query
>>> urlparse.parse_qs(qs)
{'pf_rd_m': ['ATVPDKIKX0DER'], 'pf_rd_i': ['3421']}
>>> urlparse.parse_qs(qs)['pf_rd_m']
['ATVPDKIKX0DER']
Sign up to request clarification or add additional context in comments.

1 Comment

thanks @icecrime. this is what I have done since my thought is pf_rd_m might change in future. source_resource=source.split('?')[1].split('&')[0] I always want the first value of my query, no matter what is assigned. Advise me if any better option
2

You also have to escape the .

pattern=re.compile(source_resource + '[&\.]')

Comments

1

You can just build the string for the regular expression like a normal string, utilizing all string-formatting options available in Python:

import re
source_and="http://rads.stackoverflow.com/amzn/click/B0030DI8NA/pf_rd_m=ATVPDKIKX0DER&"
source_dot="http://rads.stackoverflow.com/amzn/click/B0030DI8NA/pf_rd_m=ATVPDKIKX0DER."
source_resource="pf_rd_m=ATVPDKIKX0DER"
regex_string = source_resource + "[&\.]"
regex = re.compile(regex_string)
print regex.findall(source_and)
print regex.findall(source_dot)
>>> ['pf_rd_m=ATVPDKIKX0DER&']
['pf_rd_m=ATVPDKIKX0DER.']

I hope this is what you mean.

Just take note that I modified your regular expression: the . is a special symbol and needs to be escaped, as is the + (I just assumed the string will only occur once, which makes the use of + unnecessary).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.