python regex with variable input

Question

I've a text file (say test.txt) e.g.

a  ......
aa ......
a+a .....
aa+ .....
a+  .....
aaa .....
.........

Now I would like to find the line number of any particular strings e.g. 'a', 'aa+' etc. I've tried to find an exact match of the input string using regex.

name='a'

import re
p = re.compile(r'\b'+re.escape(name)+ r'\b')

i=0
with open('test.txt') as inpfile:
    for num, line in enumerate(inpfile):
        if p.search(line):
            print num

The program should print "0" only but its printing 0,2,4.

My expected output is

name='a'

output: 0

name='aa'

output: 1

name='aa+'

output: 3 and so on...

I understood that the regular expression I used above, is not correct. But it will be helpful if you please share your comments/suggestions to compile the regular expression such a way that it gives the desired output for all the patterns.

Thanks.

kindall · Accepted Answer · 2013-07-23 15:10:41Z

1

Why would it not print 2 and 4? a+a and a+ both contain a surrounded by word boundaries, exactly as you have specified with \b. Perhaps you want to match the start and end of the line instead? E.g.

name='a'

import re
p = re.compile('^'+re.escape(name)+ '$')

with open('test.txt') as inpfile:
    for num, line in enumerate(inpfile):
        if p.search(line.rstrip(r'\n')):
            print num

But if you're looking to match the line exactly, why go to the trouble of using a regular expression?

name='a'

with open('test.txt') as inpfile:
    for num, line in enumerate(inpfile):
        if name == line.rstrip(r'\n'):
            print num

answered Jul 23, 2013 at 15:10

kindall

185k36 gold badges291 silver badges321 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

rana Over a year ago

thanks for the explanation. Actually there are more stuffs in each line of the text file.. I just edited the text file to avoid confusion. However, I tried your suggestion but its not working for all the patterns. Further comments will be helpful.

kindall Over a year ago

"It's not working for all the patterns." What patterns is it not working for, and what happens instead?

rana Over a year ago

thanks, its working for all the patterns. Sorry for my previous comment. I did some mistake while trying your suggestion. Thanks again for your help.

jpmuc · Accepted Answer · 2013-07-23 15:10:53Z

1

The problem is making exact sense of your regular expression. In lay terms, you are matching the expression:

"word border" followed by an 'a' followed by another "word border"

and that is why is matching lines 0 (a), 2 (a+a) and so on. Here, spaces and non-printable characters (start of line, end of line) and '+' mark end of word

answered Jul 23, 2013 at 15:10

jpmuc

1,1541 gold badge15 silver badges33 bronze badges

Comments

zhangyangyu · Accepted Answer · 2013-07-23 15:11:44Z

0

You should not use \b. It will match a+a, a+. I think you may want ^a$.

answered Jul 23, 2013 at 15:11

zhangyangyu

8,6103 gold badges35 silver badges43 bronze badges

Collectives™ on Stack Overflow

python regex with variable input

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related