10

I am trying to generate random text using letter frequencies that I have obtained. First, I succeeded with the following code:

for i in range(450):
    outcome=random.random()
    if 0<outcome<0.06775:
        sys.stdout.write('a')
    if 0.06775<outcome<0.07920:
        sys.stdout.write('b')
    if 0.07920<outcome<0.098:
        sys.stdout.write('c')
    ....

This until the letter z and spacebar. This give me >50 lines of code and I want to get the same result using an array.

So far I have :

f_list = [0, 0.06775, 0.08242, 0.10199, 0.13522, 0.23703, 0.25514, 0.27324, 0.32793, 0.38483, 0.38577, 0.39278, 0.42999, 0.45023, 0.50728, 0.56756, 0.58256, 0.58391, 0.62924, 0.68509, 0.7616, 0.78481, 0.79229, 0.81161, 0.81251, 0.82718, 0.82773, 0.99998]
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ']

import random
import sys

for i in range(25):
    outcome=random.random()
    if f_list[i]<outcome<f_list[i+1]:
        sys.stdout.write('alphabet[i]')

But it isn't working properly, as the range seems now to relate to the array and not the number of iterations I want. The output is blank.

1
  • (the frequencies in the 2nd code are the correct ones) Commented Dec 21, 2011 at 11:48

2 Answers 2

17
import random
import sys
import bisect

f_list = [0, 0.06775, 0.08242, 0.10199, 0.13522, 0.23703, 0.25514, 0.27324, 0.32793, 0.38483, 0.38577, 0.39278, 0.42999, 0.45023, 0.50728, 0.56756, 0.58256, 0.58391, 0.62924, 0.68509, 0.7616, 0.78481, 0.79229, 0.81161, 0.81251, 0.82718, 0.82773, 0.99998]
alphabet = 'abcdefghijklmnopqrstuvwxyz '

for i in xrange(450):
    sys.stdout.write(alphabet[bisect.bisect(f_list, random.random()) - 1])

does the trick and returns (example):

l wefboethol gsplotfoh ua onpedefh dnolnairnioeiegehhecaworonnfmeuej dsiauhpbfttwcknal ateof ap cgbr sunnee leseaeeecltaiaur u oen vxntgsoio kdeniei ot df htr dcencrsrrfp bwelsuoaslrnr heh ee tpt oeejaldeatcto fi a u idimiadmgglral o m iaielbtnt es oe shlspudwdfrrsvol oo i tlwh d r i swhsnloai p swlooi wbe nn sshth nsawtnrqsud mtw diit pner r nitmah todf zcsehma hl e ros ctee toiouinn i hl hlonphioe nh gan ho heein itrgeylftn epaacrmanhe

alphabet can be defined as a simple string too (accessing its elements - single characters - works like for lists)

bisect.bisect(list, value) takes a sorted list and a value and tells where this value should be put between. More about bisect.

Sign up to request clarification or add additional context in comments.

4 Comments

+1 for the binary search algorithm usage. People are still using to much algorithms in linear time.
@eumiro need your help to understand the binary search part.
@DhruvPathak - you can find some examples on the linked documentation page.
@eumiro How do I store this random text into a name, say, "text1", in order to use it later in my program? Thank you!
2

Eumiros answer is perfect, and much simpler then mine, but because i made the effort to modify a older solution to a similar problem, i don't want it to go to waste.

I even had the link still around for the discussion about weighted random generators from which i borrowed the "King of the hill" algorithm.

from string import lowercase
from random import random

class TextGenerator(object):        
        def __init__(self, flist, textlength, charmap = lowercase + ' '):            
            self.text_length = textlength
            self.chars = charmap
            self.weights = self._get_weight_list(flist)            

        def create_new_weights(self, flist):
            self.weights = self._get_weight_list(flist)

        def get_weight(self, char):
            return self.weights[self.chars.index(char)]            

        def change_weight(self, char, weight):
            self.weights[self.chars.index(char)] = weight

        def _get_weight_list(self, flist):
            return map (lambda x, y: y-x,
                        flist,
                        flist[1:] + [1.0])[:-1]

        def windex(self):
            assert(len(self.weights) == len(self.chars))
            rnd = random() * sum(self.weights)
            for i, w in enumerate(self.weights):
                rnd -= w
                if rnd < 0:
                    return i

        def create_text(self, flist = None):
            weights = self._get_weight_list(flist)if flist else self.weights
            return u''.join([self.chars[self.windex()] for i in range(self.text_length)])

flist = [0, 0.067750000000000005, 0.082419999999999993, 0.10199, 0.13522000000000001, 0.23702999999999999, 0.25513999999999998, 0.27323999999999998, 0.32793, 0.38483000000000001, 0.38577, 0.39278000000000002, 0.42998999999999998, 0.45023000000000002, 0.50727999999999995, 0.56755999999999995, 0.58255999999999997, 0.58391000000000004, 0.62924000000000002, 0.68508999999999998, 0.76160000000000005, 0.78481000000000001, 0.79229000000000005, 0.81161000000000005, 0.81250999999999995, 0.82718000000000003, 0.82772999999999997, 0.99997999999999998]

texter = TextGenerator(flist, 1000)
print texter.create_text()

texter.change_weight('i', texter.get_weight('e') * 2)
print texter.create_text()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.