4

I want to check where the string (Product Name) contains the word beta, Since I am not so good in regex writing : eg.

"Crome beta"
"Crome_beta"
"Crome beta2"
"Crome_betaversion"
"Crome 3beta" 
"CromeBerta2.3"
"Beta Crome 4" 

So that I can raise error that this is not valid product name , its a product version. i wrote a regex which is able to cought the above strings

parse_beta = re.compile( "(beta)", re.I)
if re.search(parse_data, product_name):
     logging error 'Invalid product name'

But if the product name contains the word having substring beta init like "tibetans product" so the above regex it is parsing beta and raising error. i want to handle this case.Any one can suggest me some regex.

Thanks a lot.

1
  • "beta in product_name.lower() Commented Oct 3, 2011 at 10:39

4 Answers 4

2

Try ((?<![a-z])beta|cromebeta). (the word beta not preceded by a letter or the full word cromebeta)

I'll add a quote from http://docs.python.org/library/re.html to explain the first part.

(?<!...) Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.

Sign up to request clarification or add additional context in comments.

5 Comments

But the data can be a like "CromeBeta2.3" like that.
@Shashi Then you must learn to make good examples in the question. I'm quite inflexible on this. My answer follows the rules set in the question.
@Shashi If you think your question was wrongly written, then you can correct it. I'm not so inflexible on changing my reply based on corrected questions :-) Errare Humanum Est
actually the the example that i have gave ,i missed some combination now i have added, thanks for your suggestion.
@Shashi I hope the sixth example isn't about this
0

We should cover all the cases of beta version names, where the regexp should give a match.

So we start writing the pattern with the first example of beta "Crome beta":

' [Bb]eta'

We use [Bb] to match B or b in the second place.

The second example "Crome_beta" adds _ as a separator:

'[ _][Bb]eta'

The third "Crome beta2" and the forth "Crome_betaversion" examples are covered by the last regexp.

The fifth example "Crome 3beta" forces us to change the pattern this way:

'[ _]\d*[Bb]eta'

where \d is a substitute for [0-9] and * allows from 0 to infinity elements of \d.

The sixth example "CromeBeta2.3" shows that Beta can have no preceding _ or space, just start with the capital. So we cover it with | construction which is the same as or operator in Python:

'[ _]\d*[Bb]eta|Beta'

The seventh example Beta Crome 4 is matched by the least regexp (since it starts with Beta). But it can also be beta Chrome 4, so we would change the pattern this way:

'[ _]\d*[Bb]eta|Beta|^beta '

We don't use ^[Bb]eta since Beta is already covered.

Also, I should mention, we can't use re.I since we have to differentiate between beta and Beta in the regex.

So, the test code is (for Python 2.7):

from __future__ import print_function
import re, sys

match_tests = [
"Crome beta",
"Chrome Beta",
"Crome_beta",
"Crome beta2",
"Crome_betaversion",
"Crome 3beta" ,
"Crome 3Beta",
"CromeBeta2.3",
"Beta Crome 4",
"beta Chrome ",
"Cromebeta2.3" #no match,
"betamax" #no match,
"Betamax"]

compiled = re.compile(r'[ _]\d*[Bb]eta|Beta|^beta ')
for test in match_tests:
    search_result = compiled.search(test)
    if search_result is not None:
        print("{}: OK".format(test))
    else:
        print("{}: No match".format(test), file=sys.stderr)

I don't see any need to use negative lookbehind. Also, you used a capturing group (beta) (parenthesis). There is no need for it either. It would just slow down the regexp.

Comments

0

Seems like you've actually got two concepts in the Product Name string: Product and version, with a separator of whitespace and underscore, from the examples you gave. Use a regex such that splits the two concepts, and search for the word beta only in the version concept.

1 Comment

No actually the above string is only the product name but some time the product version get inserted in product name string so to check that i need regex.
0
"[Bb]eta(\d+|$|version)|^[Bb]eta "

test with grep:

kent$  cat a                                            
Crome beta
Crome_beta
Crome beta2
Crome_betaversion
Crome 3beta
CromeBeta2.3
tibetans product
Beta Crome 4


kent$  grep -P "[Bb]eta(\d+|$|version)|^[Bb]eta " a     
Crome beta
Crome_beta
Crome beta2
Crome_betaversion
Crome 3beta
CromeBeta2.3
Beta Crome 4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.