0

I've a text file (say test.txt) e.g.

a  ......
aa ......
a+a .....
aa+ .....
a+  .....
aaa .....
.........

Now I would like to find the line number of any particular strings e.g. 'a', 'aa+' etc. I've tried to find an exact match of the input string using regex.

name='a'

import re
p = re.compile(r'\b'+re.escape(name)+ r'\b')

i=0
with open('test.txt') as inpfile:
    for num, line in enumerate(inpfile):
        if p.search(line):
            print num

The program should print "0" only but its printing 0,2,4.

My expected output is

name='a'

output: 0

name='aa'

output: 1

name='aa+'

output: 3 and so on...

I understood that the regular expression I used above, is not correct. But it will be helpful if you please share your comments/suggestions to compile the regular expression such a way that it gives the desired output for all the patterns.

Thanks.

3 Answers 3

1

Why would it not print 2 and 4? a+a and a+ both contain a surrounded by word boundaries, exactly as you have specified with \b. Perhaps you want to match the start and end of the line instead? E.g.

name='a'

import re
p = re.compile('^'+re.escape(name)+ '$')

with open('test.txt') as inpfile:
    for num, line in enumerate(inpfile):
        if p.search(line.rstrip(r'\n')):
            print num

But if you're looking to match the line exactly, why go to the trouble of using a regular expression?

name='a'

with open('test.txt') as inpfile:
    for num, line in enumerate(inpfile):
        if name == line.rstrip(r'\n'):
            print num
Sign up to request clarification or add additional context in comments.

3 Comments

thanks for the explanation. Actually there are more stuffs in each line of the text file.. I just edited the text file to avoid confusion. However, I tried your suggestion but its not working for all the patterns. Further comments will be helpful.
"It's not working for all the patterns." What patterns is it not working for, and what happens instead?
thanks, its working for all the patterns. Sorry for my previous comment. I did some mistake while trying your suggestion. Thanks again for your help.
1

The problem is making exact sense of your regular expression. In lay terms, you are matching the expression:

"word border" followed by an 'a' followed by another "word border"

and that is why is matching lines 0 (a), 2 (a+a) and so on. Here, spaces and non-printable characters (start of line, end of line) and '+' mark end of word

Comments

0

You should not use \b. It will match a+a, a+. I think you may want ^a$.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.