I'm building a Markov text poetry generator. The main method, which is attached below, performs a depth-first search through the Markov chain, which is implemented as a NetworkX digraph of Word objects, which have methods for determining whether e.g. a Word rhymes with another Word instance.
At each level of the search, I filter the successor nodes to those that match user-supplied constraints - for example, "first word in sequence must rhyme with last word in sequence", so lines of poetry with different poetic devices can be built from the Markov chain.
The code I have works, but with large amounts of text (27412 unique words, 160892 edges between words) for certain combinations of constraints, especially rhyming ones, it takes a very long time to come up with a sequence that satisfies the constraints, if there is a possible sequence in the chain. Here's an example:
>>> vg = VerseGenerator(some_very_long_text)
>>> cs = [Constraint(Word.rhymeswith, [0, 3])]
>>> vg.build_sequence(4, [], cs)
[Word(among), Word(the), Word(white), Word(young)] - 204ms via IPython %timeit
>>> cs = [Constraint(Word.rhymeswith, [0, 9])]
>>> vg.build_sequence(10, [], cs) - Doesn't terminate for at least 10 minutes
Can anyone see a way I could speed this up, if any? I'm experimenting with Cython, but it will be a while before I get up to speed on it. Perhaps there is some clever graph theory property or algorithm I could use to preprocess it, or something?
The code in its entirety is available at my Github. I'm using Python 3. I hope I have explained it clearly enough, but if not please point out what other information I should include.
def build_sequence(self, length, predicates, constraints, order='roulette'):
''' Traverse the graph in depth first order to build a sequence
of the given length which fits the supplied predicates. If a
matching sequence is not found, return partially constructed
sequence. Begins with a randomly selected word from the graph
'''
# If there are any multiple-word constraints that begin at
# this level, we need to create predicates so they can be
# applied at future search levels
def apply_constraints(cands, path, level, preds, cons):
cs = [c for c in cons if c.indices[0] == level]
search_nodes = []
for can in cands:
new_preds = []
for con in cs:
curried = partial(con.method, can)
sub = con.indices[1:]
new_preds.extend([Predicate(curried, i) for i in sub])
rec = {'word': can, 'parent': path, 'level': level,
'preds': preds + new_preds}
search_nodes.append(rec)
return search_nodes
stack = []
level = 0
# Get candidate starting nodes
apply = [p.partial for p in predicates if p.index == level]
cands = self.filter_words(self.chain.nodes(), apply)
random.shuffle(cands)
branches = apply_constraints(cands, None, level, predicates, constraints)
stack.extend(branches)
while stack and level < length:
path = stack.pop()
prev = path['word']
level = path['level'] + 1
preds = path['preds']
# Filter nodes following previous word to get candidates for
# the next word in the sequence.
apply = [p.partial for p in preds if p.index == level]
succ = self.shuffled_successors(prev, order=order)
cands = self.filter_words(succ, apply)
branches = apply_constraints(cands, path, level, preds, constraints)
stack.extend(branches)
# Walk backward through final recursive record to build the
# resulting sentence
result = []
for i in range(level):
result.append(path['word'])
path = path['parent']
return reverse(result)