1

Encountered issue with generating random strings.

Example below generates repeated blocks of random strings. Amount of random string in block depends on 'WORD_LENGTH'. For 1M 'COUNT' and 'WORD_LENGTH' of 20 chars each block contains 262144 (2^18) random strings and then block repeats.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define WORD_LENGTH 20

//const char charset[62] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
const char charset[16] = "0123456789abcdef";

int main(int argc, char** argv) {
    srand(time(NULL));
    if (argc != 2) {
        printf("Usage: program COUNT'\n\n");
        return 0;
    }
    unsigned int count = atoi(argv[1]);
    char buf[WORD_LENGTH];
    for (int c = 0; c < count; c++ ) {
        for (int i = 0; i < WORD_LENGTH; ++i) {
            buf[i] = charset[ rand() % sizeof charset];
        }
        buf[WORD_LENGTH] = '\0';
        printf("%s\n", buf);
    }
    return 0;
}

Important thing. I could not reproduce "issue" when "..charset[62]..." are used with 'COUNT' up to 100M. Question: Could someone please explain why it works that way ?

3
  • Is RAND_MAX (a constant defined in <stdlib.h>) set to 32767 (2^15 - 1) or 2147483647 (2^31 - 1) or some other number? Commented Jan 14, 2022 at 4:15
  • Your inner loop 'wastes' a random value; the loop writes a random value to buf[WORD_LENGTH - 1] and and the code after the loop then overwrites that with a null byte. Not a major problem, but marginally careless. Commented Jan 14, 2022 at 4:21
  • @Jonathan Leffler. Yes. I see. Commented Jan 14, 2022 at 16:00

3 Answers 3

1

C uses Pseudorandom number generator in rand() function. Thus they are repeating sequence.

Sign up to request clarification or add additional context in comments.

3 Comments

Problem somehow connected with 'charset'. I can not reproduce with "0123456789". But reproduce with "0123456789abcdef" or "abcdef0123456789abcdef0123456789". Strange as for me.
@Mark_85 try it with any charset that has length being a multiple of 16. Interested to see if you find similar disfavor elsewhere. Of course, you can sidestep this quagmire by using a cryptographically secure rng and a uniform distribution to eliminate the modulo-bias, but I doubt you need to pull out nuclear weapons for killing crop ferrets. However, if you're doing this to generate random passwords for people doing things like setting up new accounts, it may not be a bad idea.
It is connected with sequence period. AFAIK C uses powers of two as mod. In case charset[16] 16 is also a power of two. But in case charset[62] repeating sequence will have length 2^n * 31
1

What about try this one:

char buf[WORD_LENGTH];
for (int c = 0; c < count; c++ ) {
    for (int i = 0; i < WORD_LENGTH; ++i) {
        buf[i] = charset[ rand() / (RAND_MAX + 1u) * sizeof charset];
    }
    buf[WORD_LENGTH - 1] = '\0';
    printf("%s\n", buf);
}

It is said in the C reference, codes like rand() % sizeof charset is biased. This answer may give some ideas.

4 Comments

Tried. In my case it generates zeros.
@Mark_85: of course it does. RAND_MAX+1 is always greater than the value returned by rand(), and you're using integer divide, which truncates the fraction. So you end up with 0. You need to force floating point: (int)(rand()/(RAND_MAX+1.0) * (sizeof charset)).
@rici Tested - works !
@Mark_85: Cool. But the important thing is that you understand why it was necessary, because it's a very common mistake. (Eg., the frequently asked "Why doesn't pow(x, 1/3) compute the cube root?").
1

To sum up comments from @WhozCraig and @LightVillet.

Problem discovered in :

buf[i] = charset[ rand() % sizeof charset]

Being precise with 'rand() % X' and connected with 'X' and not connected with charset array.

Issue reproduced when

X = 4,8,16,32,64. 

But not reproduced with values between. Made short tests with COUNT up to 1M.

Be careful with rand() and use

(int)(rand()/(RAND_MAX+1.0) * (sizeof charset))

Which was mentioned by @Mr.Chip and @rici

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.