1
import re
sum=0
file = open("pro.txt").readlines()
for lines in file:
        word= len(re.findall('(^|[^\w\-])able#1(?=([^\w\-]|$))', lines))
        if word>0:
                sum=sum+1
print sum

I am counting number of words in text file , but my program also count some words which is not of our need , i use r.e in it , but its not giving me any appropriate help this is my text file

0         6          9     able#1
0         11         34    unable#1
9         12         22    able#1
0         6          9     able#1-able#1
0         11         34    unable#1*able#1

I dont want my program to count ,-able#1 ,able#1-able#1 ,unable#1*able#1 these type of word , i should only count able#1

7
  • @nhahtdh but i also have to find the number against that word through this text file Commented Feb 25, 2013 at 15:54
  • how about just removing everything after the first occurrence of the hash symbol ? Commented Feb 25, 2013 at 16:08
  • why not apply the regex after removing everything past the first occurrence of the '#' symbol (plus one more character maybe)? Commented Feb 25, 2013 at 16:10
  • yes , but if i match the word label than i also have to find the number against it like 0 6 9 in the example above Commented Feb 25, 2013 at 16:14
  • 1
    @Angel: For each line, cut up your data into 4 parts 0, 6, 9, able#1 (can be done with split, with limit on number of parts), and check the last item to decide to keep the data or not. Commented Feb 25, 2013 at 16:38

3 Answers 3

1

You can use the regex \sable#1\s*$ that requires one whitespace before able and allows zero or more whitespace (and nothing else) at the end of line.

import re
regex = re.compile(r'\sable#1\s*$')
count = 0
with open("pro.txt") as file:
    for line in file:
        if regex.search(line):
            count += 1
print count

You could also count with sum() and a generator expression like this:

with open("pro.txt") as file:
    count = sum(1 for line in file if regex.search(line))
Sign up to request clarification or add additional context in comments.

Comments

0
[^\W]*able#1\W

The [^\W]* expression means match zero or more characters that are not non alphanumeric So it will not care about the "un" in "unable"!

I would write the regexp like this

if re.search('\s+[-_]*able#\S*',lines):

\s+ is any non zero amount of whitespace \S* is any amount of whitespace including zero

EDIT: altered for late requirement change to match "_able#" and "-able#"

4 Comments

You might like to look at this answer stackoverflow.com/questions/14808943/… which tries to explain how to debug regexp
i tried this one , because it also count able#1-able#1 , i want to just count able#1 to -able#1 or _able#1
i change my r.e , but still a bit problem is remaining , you link was i think not related to my work
Angel the link is a short answer I wrote on how to effectively test regexps in python immediate evalutation mode, you might find it useful
0

If you're only interested in counting complete words, you could do this:

re.findall('(?:\W|\A)able#1(?=\W|\Z)', line)

The (?:\W|\A) will match either the beginning of line or something that's not like a word (i.e. [0-9a-z_]).

Likewise, (?=\W|\Z) is a look-ahead assertion for either the end of line or something that's not like a word.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.