python regex: match last instance of character followed by certain parttern

Question

I want to edit my code below to catch all strings which END with "_C[any letter/ any number/ or nothing]"

Here is my list

 name_list = ['chrome_PM',
             'chrome_P',
             'chromerocker_C',
             'chromebike_P1',
             'chromecar_CMale',
             'chromeone_C1254',
             'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C[\da-zA-Z_]*$')
    if counts_tail.search(name):
        print name

output:

chromerocker_C
chromecar_CMale
chromeone_C1254
Lukate_Aids_Consumer_P

expected output:

chromerocker_C
chromecar_CMale
chromeone_C1254

'Lukate_Aids_Consumer_P' should not be included because it doesnt END with '_C', how can I edit my code to handle this bug?

Thanks

a pythonic way : matching = [s for s in name_list if "_C" in s] — user4375224
– user4375224, Commented Dec 29, 2014 at 12:16
What python are you using? I ran your script in python 2.7.6 and the output was only these two: chromecar_CMale chromeone_C1254 — Collierre
– Collierre, Commented Dec 29, 2014 at 12:17
I can't reproduce your code. I get chromecar_CMale and chromeone_C1254 only with your current code. — Jerry
– Jerry, Commented Dec 29, 2014 at 12:17
Sorry guys, I was using an older regex, try it now, it should work. — Boosted_d16
– Boosted_d16, Commented Dec 29, 2014 at 12:26

Avinash Raj · Accepted Answer · 2014-12-29 12:40:09Z

1

You just need to remove the _ from the last character class.

counts_tail = re.compile('_C[\da-zA-Z_]*$')
                                     ^
                                     |

So the correct form would be,

 name_list = ['chrome_PM',
             'chrome_P',
             'chromerocker_C',
             'chromebike_P1',
             'chromecar_CMale',
             'chromeone_C1254',
             'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C[\da-zA-Z]*$')
    if counts_tail.search(name):
        print name

Because of the _ present inside the character class, it matches _Consumer_P substring in Lukate_Aids_Consumer_P.

answered Dec 29, 2014 at 12:40

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ashwani · Accepted Answer · 2014-12-29 12:28:50Z

0

You find that last occurrence of _ is followed by C or not:

for name in name_list:
    index = name.rfind('_')
    if(name[index+1] == 'C'):
       print name

Remember you can use this only if your string does not contain characters other than digits, letters and _. Otherwise you can use this regex:

'_C(\d|[a-zA-Z])*$'

It means _C followed by zero or more occurrence of \d|[a-zA-Z] (digit or letter) followed by $ (end of line.)

edited Dec 29, 2014 at 12:28

answered Dec 29, 2014 at 12:20

Ashwani

2,06014 silver badges16 bronze badges

Comments

Irshad Bhat · Accepted Answer · 2014-12-29 12:37:25Z

0

Use re.compile('_C[^\W_]*$')

You could have simply used re.compile('_C\w*$') but \w includes _ too which is not required. So best way is to use re.compile('_C[^\W_]*$') which excludes everything other than letters, numbers and _.

Demo:

>>> name_list = ['chrome_PM',
         'chrome_P',
         'chromerocker_C',
         'chromebike_P1',
         'chromecar_CMale',
         'chromeone_C1254',
         'Lukate_Aids_Consumer_P']

>>> for name in name_list:
...     counts_tail = re.compile('_C[^\W_]*$')
...     if counts_tail.search(name):
...         print name
... 
chromerocker_C
chromecar_CMale
chromeone_C1254

edited Dec 29, 2014 at 12:37

answered Dec 29, 2014 at 12:27

Irshad Bhat

8,7792 gold badges31 silver badges37 bronze badges

Comments

vks · Accepted Answer · 2014-12-29 12:39:40Z

0

 name_list = ['chrome_PM',
         'chrome_P',
         'chromerocker_C',
         'chromebike_P1',
         'chromecar_CMale',
         'chromeone_C1254',
         'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C[\da-zA-Z]*$')
    if counts_tail.search(name):
        print name

answered Dec 29, 2014 at 12:39

vks

68.1k11 gold badges96 silver badges132 bronze badges

Comments

Wasim Karani · Accepted Answer · 2014-12-30 05:07:15Z

0

Code

name_list = ['chrome_PM',
             'chrome_P',
             'chromerocker_C',
             'chromebike_P1',
             'chromecar_CMale',
             'chromeone_C1254',
             'Lukate_Aids_Consumer_P']

for name in name_list:
    counts_tail = re.compile('_C(\d|[a-zA-Z])*$') # Added * 
                                                  # Unnecessary use of + in both \d and [a-zA-Z] (thanks to @Ashwani Dausodia)
    if counts_tail.search(name):
        print name

Output

chromerocker_C
chromecar_CMale
chromeone_C1254

edited Dec 30, 2014 at 5:07

answered Dec 29, 2014 at 12:20

Wasim Karani

8,87410 gold badges58 silver badges100 bronze badges

3 Comments

Boosted_d16 Over a year ago

@Wazzy, thanks this works but I have edited my original regex, I accidentally posted an older version. Could you try with the new version. thanks!

Aran-Fey Over a year ago

For the love of all that is holy, please don't write regex like (x+)*.

Ashwani Over a year ago

Unnecessary use of + in both \d and [a-zA-Z].

Collectives™ on Stack Overflow

python regex: match last instance of character followed by certain parttern

5 Answers 5

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related