Generate a random binary number with a variable proportion of '1' bits

Question

I need a function to generate random integers. (assume Java long type for now, but this will be extended to BigInteger or BitSet later.)

The tricky part is there is a parameter P that specifies the (independent) probability of any bit in the result being 1.

If P = 0.5 then we can just use the standard random number generator. Some other values of P are also easy to implement. Here's an incomplete example:

Random random = new Random();

// ...

long nextLong(float p) {
    if      (p == 0.0f)   return 0L;
    else if (p == 1.0f)   return -1L;
    else if (p == 0.5f)   return random.nextLong();
    else if (p == 0.25f)  return nextLong(0.5f) & nextLong(0.5f);
    else if (p == 0.75f)  return nextLong(0.5f) | nextLong(0.5f);
    else if (p == 0.375f) return nextLong(0.5f) & nextLong(0.75f); // etc
    else {
      // What goes here??
      String message = String.format("P=%f not implemented yet!", p);
      throw new IllegalArgumentException(message);
    }
}

Is there a way to generalise this for any value of P between 0.0 and 1.0?

Out of curiosity, have you tested that the accepted answer works? How many iterations are necessary to be acceptably close to the given probability? Was the performance still adequate? — Wim Coenen
– Wim Coenen, Commented Jan 16, 2010 at 13:11
Yes, I capped it at 16 iterations (1 iteration == 1 bit of accuracy) and it generates about 17 million bits per second on my (fairly old) PC. I measured this with irrational probabilities (e.g. 0.1*PI) but it's much faster for round numbers, e.g. 120Mbits/sec for p=0.75. — finnw
– finnw, Commented Jan 16, 2010 at 14:04
Is there any particular reason to avoid bit shifting and multiplication? — Svante
– Svante, Commented Jan 18, 2010 at 0:03
@Svante, no. My first attempt was too slow and I assumed it was because of the bit shifting/multiplication, but actually it was just that I was generating too many random numbers (by rolling separately for each output bit.) — finnw
– finnw, Commented Jan 18, 2010 at 1:01

finnw · Accepted Answer · 2010-01-16 13:53:36Z

6

First a little ugly math that you're already using in your code.

Define x and y are bits with probability of being 1 of X = p(x=1), Y = p(y=1) respectively. Then we have that

 p( x & y = 1) = X Y
 p( x | y = 1) = 1 - (1-X) (1-Y)
 p( x ^ y = 1) = X (1 - Y) + Y (1 - X)

Now if we let Y = 1/2 we get

P( x & y ) = X/2
P( x | y ) = (X+1)/2

Now set the RHS to the probability we want and we have two cases that we can solve for X

X = 2 p        // if we use &
X = 2 p - 1    // if we use |

Next we assume we can use this again to obtain X in terms of another variable Z... And then we keep iterating until we've done "enough".

Thats a bit unclear but consider p = 0.375

0.375 * 2 = 0.75  < 1.0 so our first operation is &
0.75 * 2 = 1.5 > 1.0 so our second operation is |
0.5 is something we know so we stop.

Thus we can get a variable with p=0.375 by X1 & (X2 | X3 )

The problem is that for most variables this will not terminate. e.g.

0.333 *2 = 0.666 < 1.0 so our first operation is &
0.666 *2 = 1.333 > 1.0 so our second operation is |
0.333 *2 = 0.666 < 1.0 so our third operation is &
etc...

so p=0.333 can be generated by

X1 & ( X2 | (X3 & (X4 | ( ... ) ) ) )

Now I suspect that taking enough terms in the series will give you enough accuracy, and this can be written as a recursive function. However there might be a better way that that too... I think the order of the operations is related to the binary representation of p, I'm just not sure exactly how... and dont have time to think about it deeper.

Anyway heres some untested C++ code that does this. You should be able to javaify it easily.

uint bitsWithProbability( float p )
{
   return bitsWithProbabilityHelper( p, 0.001, 0, 10 );
}

uint bitsWithProbabilityHelper( float p, float tol, int cur_depth, int max_depth )
{
   uint X = randbits();
   if( cur_depth >= max_depth) return X;
   if( p<0.5-tol)
   {
     return X & bitsWithProbabilityHelper( 2*p, 0.001, cur_depth+1, max_depth );
   }
   if(p>0.5+tol)
   {
     return X | bitsWithProbabilityHelper( 2*p-1, 0.001, cur_depth+1, max_depth );
   }
   return X;
}

edited Jan 16, 2010 at 13:53

finnw

48.8k24 gold badges150 silver badges223 bronze badges

answered Jan 16, 2010 at 2:59

Michael Anderson

74.3k9 gold badges147 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Michael Anderson Over a year ago

You can probably adjust tol at easch step and remove max_depth. Again just requires a bit more spare brains than I currently have.

Wim Coenen Over a year ago

+1 I had worked about the same thing and was about to test whether it converges

finnw Over a year ago

Yes I think cur_depth is redundant. Multiplying tol by a constant factor like 2.001 will have a similar effect.

Michael Anderson Over a year ago

@finw Indeed. Infact the error in the probability at truncating the series after N terms is at most pow( 2, -N ). So scaling tol by 2 every iteration is sensible / correct.

finnw Over a year ago

I figured out a way to do it iteratively using the binary representation of P. See my second answer.

|

Ondra Žižka · Accepted Answer · 2010-01-16 03:08:48Z

2

Distribute proportional number of bits throughuot the number. Pseudocode:

long generateNumber( double probability ){
  int bitCount = 64 * probability;
  byte[] data = new byte[64]; // 0-filled

  long indexes = getRandomLong();

  for 0 to bitCount-1 {
    do { 
      // distribute this bit to some postition with 0.
      int index = indexes & 64;
      indexes >> 6;
      if( indexes == 0 ) indexes = getRandomLong();
    } while ( data[index] == 0 );
    data[index] = 1;
  }

  return bytesToLong( data );
}

I hope you get what I mean. Perhaps the byte[] could be replaced with a long and bit operations to make it faster.

edited Jan 16, 2010 at 3:08

answered Jan 16, 2010 at 3:01

Ondra Žižka

47.2k49 gold badges228 silver badges298 bronze badges

4 Comments

Ondra Žižka Over a year ago

I hope I understood the question - do you need to guarantee the number of bits being 1, i.e. some bits permutation?

finnw Over a year ago

Interesting, but doesn't meet the requirement that bits must be independent of each other. This method will always return exactly (64*P) ones in the result. So for example P=1/16 exactly 4 bits are set, so if bits 0-3 are set then all other bits must be clear.

finnw Over a year ago

However if this was combined with a binomial-distribution generator to choose bitCount then it could give the right result.

finnw Over a year ago

I have now implemented this as a combination of (a) binomial-distribution generator for bitCount and (b) a lookup table of permutations of bits.

finnw · Accepted Answer · 2013-08-06 12:09:20Z

2

Here's how I solved it in the end.

Generate an integer N between 0..16, following the binomial distribution. This gives the number of '1' bits in the 16-bit partial result.
Randomly generate an index into a lookup table that contains 16-bit integers containing the desired number of '1' bits.
Repeat 4 times to get four 16-bit integers.
Splice these four 16-bit integers together to get a 64-bit integer.

This was partly inspired by Ondra Žižka's answer.

The benefit is that it reduces the number of calls to Random.nextLong() to 8 calls per 64 bits of output. For comparison, rolling for each individual bit would require 64 calls. Bitwise AND/OR uses between 2 and 32 calls depending on the value of P

Of course calculating binomial probabilities is just as expensive, so those go in another lookup table.

It's a lot of code, but it's paying off in terms of performance.

Update - merged this with the bitwise AND/OR solution. It now uses that method if it guesses it will be more efficient (in terms of calls to Random.next().)

edited Aug 6, 2013 at 12:09

answered Jan 16, 2010 at 16:46

finnw

48.8k24 gold badges150 silver badges223 bronze badges

2 Comments

bsa Over a year ago

I was considering this problem too, and independently came up with something like your bitwise AND/OR method. Can I check my understanding of your final solution? You have two LUTs: 1) containing the binomial distribution with n=16 and P=0.5, and 2) containing all 2^16 16-bit integers grouped by number of bits set. You generate a random index to select a number of bits from LUT 1, then another random index to select a 16-bit value with that many bits from LUT 2. Is that correct?

finnw Over a year ago

@bsa sorry for the late reply. I have many different versions of the binomial table with varying values of P (in steps of 1/1024). The output is slightly off so I then XOR it with another long with a Poisson distribution (which is close enough to the exact binomial for the tiny deviations of P involved). The Poisson-distributed words are also fast because the majority of them resolve to zero with just one call to the RNG.

President James K. Polk · Accepted Answer · 2010-01-16 02:01:18Z

1

Use a random generator that generates a uniform float number r between 0 and 1. If r>p then set the bit to 0, otherwise set it to 1

answered Jan 16, 2010 at 2:01

President James K. Polk

42.2k34 gold badges113 silver badges149 bronze badges

4 Comments

finnw Over a year ago

That's what I'm trying to avoid. It's too slow for my application.

President James K. Polk Over a year ago

ahh, I see what you are getting at. I'll think more about it.

Tom Hawtin - tackline Over a year ago

You aren't going to get orders of magnitude performance improvement by calculating batches of bits together.

finnw Over a year ago

@Tom, actually I do get an order-of-magnitude improvement, at least in the case of p=0.25, because I am generating 4 random words for each output word (assuming the RNG output is 32 bits wide.) If I generate one bit at a time I must generate 64 random words, increasing the cost by a factor of 16.

Mark Elliot · Accepted Answer · 2010-01-16 02:17:03Z

1

If you're looking to apply some distribution where with probability P you get a 1 and with probability 1-P you get a 0 at any particular bit your best bet is simply to generate each bit independently with probability P of being a 1 (that sounds like a recursive definition, I know).

Here's a solution, I'll walk through it below:

public class MyRandomBitGenerator
{

    Random pgen = new Random();

    // assumed p is well conditioned (0 < p < 1)
    public boolean nextBitIsOne(double p){
        return pgen.nextDouble() < p ? true : false;
    }

    // assumed p is well conditioned (0 < p < 1)
    public long nextLong(double p){
        long nxt = 0;
        for(int i = 0; i < 64; i++){
           if(nextBitIsOne(p)){
               nxt += 1 << i;
           }
        }
        return nxt;
    }

}

Basically, we first determine how to generate a value of 1 with probability P: pgen.nextDouble() generates a number between 0 and 1 with uniform probability, by asking if it's less than p we're sampling this distribution such that we expect to see p 1s as we call this function infinitely.

edited Jan 16, 2010 at 2:17

answered Jan 16, 2010 at 2:11

Mark Elliot

77.4k23 gold badges147 silver badges162 bronze badges

1 Comment

finnw Over a year ago

My original implementation looks very similar to this, and is very slow. That's why I decided to try bit manipulation to parallelize the bits within a word. If P is a round number like 0.25, 0.5 or 0.75 it gives a huge performance boost. I'm not sure if this is true for other values of P though.

Community · Accepted Answer · 2017-05-23 12:26:46Z

1

Here's another variant of Michael Anderson's answer

To avoid recursion, we process the bits of P iteratively from right-to-left instead of recursively from left-to-right. This would be tricky in floating-point representation so we extract the exponent/mantissa fields from the binary representation instead.

class BitsWithProbabilityHelper {
    public BitsWithProbabilityHelper(float prob, Random rnd) {
        if (Float.isNaN(prob)) throw new IllegalArgumentException();

        this.rnd = rnd;

        if (prob <= 0f) {
            zero = true;
            return;
        }

        // Decode IEEE float
        int probBits = Float.floatToIntBits(prob);
        mantissa = probBits & 0x7FFFFF;
        exponent = probBits >>> 23;

        // Restore the implicit leading 1 (except for denormals)
        if (exponent > 0) mantissa |= 0x800000;
        exponent -= 150;

        // Force mantissa to be odd
        int ntz = Integer.numberOfTrailingZeros(mantissa);
        mantissa >>= ntz;
        exponent += ntz;
    }

    /** Determine how many random words we need from the system RNG to
     *  generate one output word with probability P.
     **/
    public int iterationCount() {
        return - exponent;
    }

    /** Generate a random number with the desired probability */
    public long nextLong() {
        if (zero) return 0L;

        long acc = -1L;
        int shiftReg = mantissa - 1;
        for (int bit = exponent; bit < 0; ++ bit) {
            if ((shiftReg & 1) == 0) {
                acc &= rnd.nextLong();
            } else {
                acc |= rnd.nextLong();
            }
            shiftReg >>= 1;
        }
        return acc;
    }

    /** Value of <code>prob</code>, represented as m * 2**e where m is always odd. */
    private int exponent;  
    private int mantissa;

    /** Random data source */
    private final Random rnd;

    /** Zero flag (special case) */
    private boolean zero;
}

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered Jan 17, 2010 at 23:28

finnw

48.8k24 gold badges150 silver badges223 bronze badges

4 Comments

Michael Anderson Over a year ago

This looks good. Only disadvantage to my method is that you can't specify a tolerance in the probability to limit the number of random numbers generated. However it would be easy to implement a clipping of the bits in the mantissa that would give a better result than using tol for guaranteed reduced running time.

Michael Anderson Over a year ago

Actually there's something wrong with that inner loop for very small p. I think -exponent can be bigger than the length of the shiftReg, and in those cases you do more work than is necessary. I'm also concerned that it might be doing the wrong thing for denomralised numbers. Maybe you could do a long binrep = p * 0xFFFFFFFF and loop through the bits in binrep rather than playing with the bits of a float?. Then you'd have a fixed length loop and known max error in the approximation.

finnw Over a year ago

You can apply a tolerance by calling iterationCount() first and if the answer is too high use a different method. The smallest non-zero value for a single-precision float is 2**-149, so the worst-case iteration count would be 149. Denormalised numbers are handled in line 9 (mantissa |= 0x800000 restores the implicit 1 bit of normal numbers, this is skipped for denormalised numbers.) If -exponent is huge then unfortunately you do need to loop that many times in order to get the probability down, even though the shiftReg will be zero for most iterations.

finnw Over a year ago

It wasn't handling zero correctly though. I've added a special case for P=0.

Kartik Kale · Accepted Answer · 2014-11-14 08:46:15Z

0

Suppose the size of bit array is L. If L=1, the chance that the 1st bit is 1 will be P, and that being 0 will be 1-P. For L=2, the probability of getting a 00 is (1-P)², a 01 or 10 is P(1-P) each and 11 is P². Extending this logic, we can first determine the first bit by comparing a random number with P, then scale the random number such that we can again get anything between 0 to 1. A sample javascript code:

function getRandomBitArray(maxBits,probabilityOf1) {
    var randomSeed = Math.random();
    bitArray = new Array();
    for(var currentBit=0;currentBit<maxBits;currentBit++){
        if(randomSeed<probabilityOf1){
            //fill 0 at current bit
            bitArray.push(0);
            //scale the sample space of the random no from [0,1)
            //to [0.probabilityOf1)
            randomSeed=randomSeed/probabilityOf1;
        }
        else{
            //fill 1 at current bit
            bitArray.push(1);
            //scale the sample space to [probabilityOf1,1)
            randomSeed = (randomSeed-probabilityOf1)/(1-probabilityOf1);
        }
    }
}

EDIT: This code does generate completely random bits. I will try to explain the algorithm better.

Each bit string has a certain probability of occurring. Suppose a string has a probability of occurrence p; we want to choose that string if our random number falls is some interval of length p. The starting point of the interval must be fixed, but its value will not make much difference. Suppose we have chosen upto k bits correctly. Then, for the next bit, we divide the interval corresponding to this k-length bit-string into two parts of sizes in the ratio P:1-P (here P is the probability of getting a 1). We say that the next bit will be 1 if the random number is in the first part, 0 if it is in the second part. This ensure that the probabilities of strings of length k+1 also remain correct.

Java code:

public ArrayList<Boolean> getRandomBitArray(int maxBits, double probabilityOf1) {
    double randomSeed = Math.random();
    ArrayList<Boolean> bitArray = new ArrayList<Boolean>();
    for(int currentBit=0;currentBit<maxBits;currentBit++){
        if(randomSeed<probabilityOf1){
            //fill 0 at current bit
            bitArray.add(false);
            //scale the sample space of the random no from [0,1)
            //to [0.probabilityOf1)
            randomSeed=randomSeed/probabilityOf1;
        }
        else{
            //fill 1 at current bit
            bitArray.add(true);
            //scale the sample space to [probabilityOf1,1)
            randomSeed = (randomSeed-probabilityOf1)/(1-probabilityOf1);
        }
    }
    return  bitArray;
}

edited Nov 14, 2014 at 8:46

answered Nov 13, 2014 at 5:01

Kartik Kale

363 bronze badges

3 Comments

Kartik Kale Over a year ago

@finnw I see that you didn't understand the algorithm properly. I have improved the explanation. Please go through it carefully. And about wrong language, I think the code is pretty simple to understand even for people unfamiliar with javascript if treated as a pseudo-code. Still, I am adding Java code.

finnw Over a year ago

If only double had infinite precision, this would work :-) Sorry but I did try running your code, and as I expected there is a lot of correlation among the low-order output bits.

Kartik Kale Over a year ago

Yes, this would work only if you want only as many random bits as the size of double. (More precisely, the entropy of the random number generator, say E). So, you can use one random number per E bits (like you did in your solution, generating blocks of bits). But that is a problem with any solution using a fixed number of random numbers. You cannot somehow squeeze more than one random bit from a coin toss, after all. Apart from that, i do not understand why you 'expect' this solution to generate a correlation among the bits.

Collectives™ on Stack Overflow

Generate a random binary number with a variable proportion of '1' bits

7 Answers 7

7 Comments

4 Comments

2 Comments

4 Comments

1 Comment

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

7 Comments

4 Comments

2 Comments

4 Comments

1 Comment

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related