0

For my own project, I have a .txt file containing 200k English words. I have a class called WordCross (a game) which will search for words with certain letters as parameters, Suppose I have the letters A X D E L P. I want to return a list of English words with these letters. Now I have stumbled upon a problem. I want to use a regex and add the words that match to a "hits" list. However, I can't think of a way to create this regex.

Here is my current code:

import re
class WordCross:
    def __init__(self, a,b,c,d,e,f):
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if words.lower() == re.search("a", words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")

Any help will be appreciated! Kind regards, Douwe

4
  • 1
    perhaps if re.search(f'[{a}{b}{c}{d}{e}{f}]', words) is not None:? Commented Jun 12, 2020 at 14:18
  • Does this answer your question? How to use a variable inside a regular expression? Commented Jun 12, 2020 at 14:19
  • @MauriceMeyer I did take a look at that code, but it only contains a single variable, not multiple. Therefore it is unclear to me how to do this using multiple variables Commented Jun 12, 2020 at 14:25
  • @Nick this does seem to work, however it does accept other strings which contain letters not given as parameters. Commented Jun 12, 2020 at 14:35

4 Answers 4

1

If you want to only return the words which match all the letters passed into the constructor, you need to use re.match and add an end-of-line anchor to the regex as well. You can use the asterisk operator (*) to allow for an arbitrary number of letters to be passed to the constructor (see the manual). In this demo I've simulated reading the file with a list of words from a string:

wordlist = '''
Founded in two thousand and eight Stack Overflow is the largest most trusted 
online community for anyone that codes to learn share their knowledge and 
build their careers More than fifty million unique visitors come to Stack Overflow
each month to help solve coding problems develop new skills and find job opportunities
'''.split()
wordlist = list(set(wordlist))

import re
class WordCross:
    def __init__(self, *letters):
        # file = open("english3.txt", "r")
        hits = []
        charset = f"[{''.join(letters)}]"
        regex = re.compile(rf"(?!.*({charset}).*\1){charset}+$", re.I)
        for word in wordlist:
            if regex.match(word) is not None:
                hits.append(word)
        hits.sort()
        print(hits)

test = WordCross("A", "C", "E", "H", "K", "T", "S")

Output:

['Stack', 'each', 'the']
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, this does work great! how would I change it if each variable letter can only be used once? E.g: suppose letters are A C E H K T S. I want to only be able to have the word CET and not CETT
When I saw that come up in my demo answer I was wondering if you were going to ask that question. Give me a few minutes...
@Douwe I've updated the regex to include a negative lookahead to ensure no character is repeated
0

I not sure exactly what regular expression you want to use, but it is trivial to build an expression using simple string substitution. You can alter your function to accept an arbitrary number of patterns to search as well. Hope this helps a little.

import re
class WordCross:
    def __init__(self, *patterns):
        list_of_patterns = "|".join(patterns)
        reg_exp = r"({0})".format(list_of_patterns)
        print(reg_exp)
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if re.search(reg_exp, words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")

Comments

0

I'm assuming words in your file is line-separated.

Code:

import re
from io import StringIO

source = '''
RegExr was created by gskinner.com, and is proudly hosted by Media Temple.
Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community, and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
'''.split()  # assuming words are line-separated here.

file_simulation = StringIO('\n'.join(source))  # simulating file open


class WordCross:
    def __init__(self, *args):
        self.file = file_simulation
        self.hits = []

        for words in self.file:
            if re.search(f"[{''.join(args)}]", words.upper()):
                self.hits.append(words.strip())

        self.hits.sort()
        print(self.hits)


test = WordCross("A", "B", "C", "D", "E", "F")

Result:

['Cheatsheet,', 'Community,', ... 'view', 'was']

Process finished with exit code 0

Comments

0

Couple of suggestions:

  • I don't see anything meriting a class here. A simple function should suffice.

  • Don't use file as a variable; it's the name of a python builtin.

  • When using an open file handle in general it's better to do so within a with block.

Untested:

import re
def WordCross(*patterns):
    pattern = "|".join(patterns)
    c_pattern = re.compile(pattern, re.IGNORECASE)
    with open("english3.txt") as fp:
        hits = [line for line in fp if c_pattern.search(line)]
    print(sorted(hits))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.