2

I want to edit my code below to catch all strings which END with "_C[any letter/ any number/ or nothing]"

Here is my list

 name_list = ['chrome_PM',
             'chrome_P',
             'chromerocker_C',
             'chromebike_P1',
             'chromecar_CMale',
             'chromeone_C1254',
             'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C[\da-zA-Z_]*$')
    if counts_tail.search(name):
        print name

output:

chromerocker_C
chromecar_CMale
chromeone_C1254
Lukate_Aids_Consumer_P

expected output:

chromerocker_C
chromecar_CMale
chromeone_C1254

'Lukate_Aids_Consumer_P' should not be included because it doesnt END with '_C', how can I edit my code to handle this bug?

Thanks

5
  • 1
    a pythonic way : matching = [s for s in name_list if "_C" in s] Commented Dec 29, 2014 at 12:16
  • What python are you using? I ran your script in python 2.7.6 and the output was only these two: chromecar_CMale chromeone_C1254 Commented Dec 29, 2014 at 12:17
  • 2
    I can't reproduce your code. I get chromecar_CMale and chromeone_C1254 only with your current code. Commented Dec 29, 2014 at 12:17
  • strange, I'm using python 2.7 Commented Dec 29, 2014 at 12:18
  • Sorry guys, I was using an older regex, try it now, it should work. Commented Dec 29, 2014 at 12:26

5 Answers 5

1

You just need to remove the _ from the last character class.

counts_tail = re.compile('_C[\da-zA-Z_]*$')
                                     ^
                                     |

So the correct form would be,

 name_list = ['chrome_PM',
             'chrome_P',
             'chromerocker_C',
             'chromebike_P1',
             'chromecar_CMale',
             'chromeone_C1254',
             'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C[\da-zA-Z]*$')
    if counts_tail.search(name):
        print name

Because of the _ present inside the character class, it matches _Consumer_P substring in Lukate_Aids_Consumer_P.

Sign up to request clarification or add additional context in comments.

Comments

0

You find that last occurrence of _ is followed by C or not:

for name in name_list:
    index = name.rfind('_')
    if(name[index+1] == 'C'):
       print name

Remember you can use this only if your string does not contain characters other than digits, letters and _. Otherwise you can use this regex:

'_C(\d|[a-zA-Z])*$'

It means _C followed by zero or more occurrence of \d|[a-zA-Z] (digit or letter) followed by $ (end of line.)

Comments

0

Use re.compile('_C[^\W_]*$')

You could have simply used re.compile('_C\w*$') but \w includes _ too which is not required. So best way is to use re.compile('_C[^\W_]*$') which excludes everything other than letters, numbers and _.

Demo:

>>> name_list = ['chrome_PM',
         'chrome_P',
         'chromerocker_C',
         'chromebike_P1',
         'chromecar_CMale',
         'chromeone_C1254',
         'Lukate_Aids_Consumer_P']

>>> for name in name_list:
...     counts_tail = re.compile('_C[^\W_]*$')
...     if counts_tail.search(name):
...         print name
... 
chromerocker_C
chromecar_CMale
chromeone_C1254

Comments

0
 name_list = ['chrome_PM',
         'chrome_P',
         'chromerocker_C',
         'chromebike_P1',
         'chromecar_CMale',
         'chromeone_C1254',
         'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C[\da-zA-Z]*$')
    if counts_tail.search(name):
        print name

Comments

0

Code

name_list = ['chrome_PM',
             'chrome_P',
             'chromerocker_C',
             'chromebike_P1',
             'chromecar_CMale',
             'chromeone_C1254',
             'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C(\d|[a-zA-Z])*$') # Added * 
                                                  # Unnecessary use of + in both \d and [a-zA-Z] (thanks to @Ashwani Dausodia)
    if counts_tail.search(name):
        print name

Output

chromerocker_C
chromecar_CMale
chromeone_C1254

3 Comments

@Wazzy, thanks this works but I have edited my original regex, I accidentally posted an older version. Could you try with the new version. thanks!
For the love of all that is holy, please don't write regex like (x+)*.
Unnecessary use of + in both \d and [a-zA-Z].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.