Python use variables inside regex

Question

For my own project, I have a .txt file containing 200k English words. I have a class called WordCross (a game) which will search for words with certain letters as parameters, Suppose I have the letters A X D E L P. I want to return a list of English words with these letters. Now I have stumbled upon a problem. I want to use a regex and add the words that match to a "hits" list. However, I can't think of a way to create this regex.

Here is my current code:

import re
class WordCross:
    def __init__(self, a,b,c,d,e,f):
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if words.lower() == re.search("a", words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")

Any help will be appreciated! Kind regards, Douwe

perhaps if re.search(f'[{a}{b}{c}{d}{e}{f}]', words) is not None:? — Nick
– Nick, Commented Jun 12, 2020 at 14:18
Does this answer your question? How to use a variable inside a regular expression? — Maurice Meyer
– Maurice Meyer, Commented Jun 12, 2020 at 14:19
@MauriceMeyer I did take a look at that code, but it only contains a single variable, not multiple. Therefore it is unclear to me how to do this using multiple variables — Douwe
– Douwe, Commented Jun 12, 2020 at 14:25
@Nick this does seem to work, however it does accept other strings which contain letters not given as parameters. — Douwe
– Douwe, Commented Jun 12, 2020 at 14:35

Nick · Accepted Answer · 2020-06-13 00:36:51Z

1

If you want to only return the words which match all the letters passed into the constructor, you need to use re.match and add an end-of-line anchor to the regex as well. You can use the asterisk operator (*) to allow for an arbitrary number of letters to be passed to the constructor (see the manual). In this demo I've simulated reading the file with a list of words from a string:

wordlist = '''
Founded in two thousand and eight Stack Overflow is the largest most trusted 
online community for anyone that codes to learn share their knowledge and 
build their careers More than fifty million unique visitors come to Stack Overflow
each month to help solve coding problems develop new skills and find job opportunities
'''.split()
wordlist = list(set(wordlist))

import re
class WordCross:
    def __init__(self, *letters):
        # file = open("english3.txt", "r")
        hits = []
        charset = f"[{''.join(letters)}]"
        regex = re.compile(rf"(?!.*({charset}).*\1){charset}+$", re.I)
        for word in wordlist:
            if regex.match(word) is not None:
                hits.append(word)
        hits.sort()
        print(hits)

test = WordCross("A", "C", "E", "H", "K", "T", "S")

Output:

['Stack', 'each', 'the']

edited Jun 13, 2020 at 0:36

answered Jun 12, 2020 at 22:55

Nick

147k23 gold badges67 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Douwe Over a year ago

Thanks, this does work great! how would I change it if each variable letter can only be used once? E.g: suppose letters are A C E H K T S. I want to only be able to have the word CET and not CETT

Nick Over a year ago

When I saw that come up in my demo answer I was wondering if you were going to ask that question. Give me a few minutes...

Nick Over a year ago

@Douwe I've updated the regex to include a negative lookahead to ensure no character is repeated

adamkgray · Accepted Answer · 2020-06-12 14:52:47Z

0

I not sure exactly what regular expression you want to use, but it is trivial to build an expression using simple string substitution. You can alter your function to accept an arbitrary number of patterns to search as well. Hope this helps a little.

import re
class WordCross:
    def __init__(self, *patterns):
        list_of_patterns = "|".join(patterns)
        reg_exp = r"({0})".format(list_of_patterns)
        print(reg_exp)
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if re.search(reg_exp, words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")

answered Jun 12, 2020 at 14:52

adamkgray

1,9571 gold badge13 silver badges28 bronze badges

Comments

jupiterbjy · Accepted Answer · 2020-06-12 14:58:35Z

I'm assuming words in your file is line-separated.

Code:

import re
from io import StringIO

source = '''
RegExr was created by gskinner.com, and is proudly hosted by Media Temple.
Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community, and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
'''.split()  # assuming words are line-separated here.

file_simulation = StringIO('\n'.join(source))  # simulating file open


class WordCross:
    def __init__(self, *args):
        self.file = file_simulation
        self.hits = []

        for words in self.file:
            if re.search(f"[{''.join(args)}]", words.upper()):
                self.hits.append(words.strip())

        self.hits.sort()
        print(self.hits)


test = WordCross("A", "B", "C", "D", "E", "F")

Result:

['Cheatsheet,', 'Community,', ... 'view', 'was']

Process finished with exit code 0

Rory Browne · Accepted Answer · 2020-06-13 00:30:39Z

0

Couple of suggestions:

I don't see anything meriting a class here. A simple function should suffice.
Don't use file as a variable; it's the name of a python builtin.
When using an open file handle in general it's better to do so within a with block.

Untested:

import re
def WordCross(*patterns):
    pattern = "|".join(patterns)
    c_pattern = re.compile(pattern, re.IGNORECASE)
    with open("english3.txt") as fp:
        hits = [line for line in fp if c_pattern.search(line)]
    print(sorted(hits))

answered Jun 13, 2020 at 0:30

Rory Browne

6971 gold badge5 silver badges11 bronze badges

Collectives™ on Stack Overflow

Python use variables inside regex

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related