I had been staring at this problem for hours, I don't know what regex format to use to solve this problem.
Problem:
Given the following input strings, find all possible output words 5 characters or longer.
- qwertyuytresdftyuioknn
- gijakjthoijerjidsdfnokg
Your program should find all possible words (5+ characters) that can be derived from the strings supplied. Use http://norvig.com/ngrams/enable1.txt as your search dictionary. The order of the output words doesn't matter.
- queen question
- gaeing garring gathering gating geeing gieing going goring
Assumptions about the input strings:
- QWERTY keyboard
- Lowercase a-z only, no whitespace or punctuation
- The first and last characters of the input string will always match the first and last characters of the desired output word.
- Don't assume users take the most efficient path between letters
- Every letter of the output word will appear in the input string
Attempted solution:
First I downloaded the the words from that webpage and store them in a file in my computer ('words.txt'):
import requests
res = requests.get('http://norvig.com/ngrams/enable1.txt')
res.raise_for_status()
fp = open('words.txt', 'wb')
for chunk in res.iter_content(100000):
fp.write(chunk)
fp.close()
I'm then trying to match the words I need using regex. The problem is that I don't know how to format my re.compile() to achieve this.
import re
input = 'qwertyuytresdftyuioknn' #example
fp= open('words.txt')
string = fp.read()
regex = re.compile(input[0]+'\w{3,}'+input[-1]) #wrong need help here
regex.findall(string)
As it's obvious, it's wrong since I need to match letters from my input string going form left to right, not any letters which I'm mistakenly doing with \w{3,}. Any help into this would be greatly appreciated.