1

I want to read a word html file and grab any words which contain letters of a name but not print them if the words are longer than the name

# compiling the regular expression:
keyword = re.compile(r"^[(rR)|(yY)|(aA)|(nN)]{5}$/")

if keyword.search (line):
    print line,

i am grabbing the words with this but don't seem to be limiting the size properly.

2 Answers 2

3

it seems you are looking for keyword.match() instead of keyword.search(). you should read this part of the python documentation which discusses the difference between match and search.

also, your regular expression seems completely off... [ and ] delimits a set of characters, so you can't put groups and have a logic around the groups. as written, your expression will also match all (, ) and |. you may try the following:

keyword = re.compile(r"^[rRyYaAnN]{5}$")
Sign up to request clarification or add additional context in comments.

Comments

1

Your RE "^[(rR)|(yY)|(aA)|(nN)]{5}$/" will never never never give a matching in any string on earth and elsewhere, I think, because of the '/' character after '$'

See the results of the RE without this '/':

import re

pat = re.compile("^[(rR)|(yY)|(aA)|(nN)]{5}$")

for ch in ('arrrN','Aar)N','()|Ny','NNNNN',
           'marrrN','12Aar)NUUU','NNNNN!'):
    print ch.ljust(15),pat.search(ch)

result

arrrN           <_sre.SRE_Match object at 0x011C8EC8>
Aar)N           <_sre.SRE_Match object at 0x011C8EC8>
()|Ny           <_sre.SRE_Match object at 0x011C8EC8>
NNNNN           <_sre.SRE_Match object at 0x011C8EC8>
marrrN          None
12Aar)NUUU      None
NNNNN!          None

My advice: think of [.....] in a RE as representing ONE character at ONE position. So every character that is between the brackets is one of the options of represented character.

Moreover, as said by Adrien Plisson, between brackets [......] a lot of special characters lost their speciality. Hence '(', ')','|' don't define group and OR, they represent just these characters as some of the options along with the letters 'aArRyYnN'

.

"^[rRyYaAnN]{1,5}$" will match only strings as 'r',ar','YNa','YYnA','Nanny'

If you want to match the same words anywhere in a text, you will need "[rRyYaAnN]{1,5}"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.