Regular expression using python is not working

Question

import re
sum=0
file = open("pro.txt").readlines()
for lines in file:
        word= len(re.findall('(^|[^\w\-])able#1(?=([^\w\-]|$))', lines))
        if word>0:
                sum=sum+1
print sum

I am counting number of words in text file , but my program also count some words which is not of our need , i use r.e in it , but its not giving me any appropriate help this is my text file

0         6          9     able#1
0         11         34    unable#1
9         12         22    able#1
0         6          9     able#1-able#1
0         11         34    unable#1*able#1

I dont want my program to count ,-able#1 ,able#1-able#1 ,unable#1*able#1 these type of word , i should only count able#1

@nhahtdh but i also have to find the number against that word through this text file — Rocket
– Rocket, Commented Feb 25, 2013 at 15:54
how about just removing everything after the first occurrence of the hash symbol ? — yurib
– yurib, Commented Feb 25, 2013 at 16:08
why not apply the regex after removing everything past the first occurrence of the '#' symbol (plus one more character maybe)? — yurib
– yurib, Commented Feb 25, 2013 at 16:10
yes , but if i match the word label than i also have to find the number against it like 0 6 9 in the example above — Rocket
– Rocket, Commented Feb 25, 2013 at 16:14
@Angel: For each line, cut up your data into 4 parts 0, 6, 9, able#1 (can be done with split, with limit on number of parts), and check the last item to decide to keep the data or not. — nhahtdh
– nhahtdh, Commented Feb 25, 2013 at 16:38

Janne Karila · Accepted Answer · 2013-02-26 08:29:31Z

1

You can use the regex \sable#1\s*$ that requires one whitespace before able and allows zero or more whitespace (and nothing else) at the end of line.

import re
regex = re.compile(r'\sable#1\s*$')
count = 0
with open("pro.txt") as file:
    for line in file:
        if regex.search(line):
            count += 1
print count

You could also count with sum() and a generator expression like this:

with open("pro.txt") as file:
    count = sum(1 for line in file if regex.search(line))

edited Feb 26, 2013 at 8:29

answered Feb 26, 2013 at 8:24

Janne Karila

25.3k6 gold badges59 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Vorsprung · Accepted Answer · 2013-02-25 16:33:50Z

0

[^\W]*able#1\W

The [^\W]* expression means match zero or more characters that are not non alphanumeric So it will not care about the "un" in "unable"!

I would write the regexp like this

if re.search('\s+[-_]*able#\S*',lines):

\s+ is any non zero amount of whitespace \S* is any amount of whitespace including zero

EDIT: altered for late requirement change to match "_able#" and "-able#"

edited Feb 25, 2013 at 16:33

answered Feb 25, 2013 at 16:11

Vorsprung

34.9k5 gold badges44 silver badges71 bronze badges

4 Comments

Vorsprung Over a year ago

You might like to look at this answer stackoverflow.com/questions/14808943/… which tries to explain how to debug regexp

Rocket Over a year ago

i tried this one , because it also count able#1-able#1 , i want to just count able#1 to -able#1 or _able#1

Rocket Over a year ago

i change my r.e , but still a bit problem is remaining , you link was i think not related to my work

Vorsprung Over a year ago

Angel the link is a short answer I wrote on how to effectively test regexps in python immediate evalutation mode, you might find it useful

Ja͢ck · Accepted Answer · 2013-02-26 09:07:02Z

0

If you're only interested in counting complete words, you could do this:

re.findall('(?:\W|\A)able#1(?=\W|\Z)', line)

The (?:\W|\A) will match either the beginning of line or something that's not like a word (i.e. [0-9a-z_]).

Likewise, (?=\W|\Z) is a look-ahead assertion for either the end of line or something that's not like a word.

answered Feb 26, 2013 at 9:07

Ja͢ck

174k39 gold badges269 silver badges316 bronze badges

Collectives™ on Stack Overflow

Regular expression using python is not working

3 Answers 3

Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related