Creating a binary variable with probability in R

Question

I'm trying to make a variable, Var, that takes the value 0 60% of the time, and 1 otherwise, with 50 000 observation. For a normally distributed, I remember doing the following for a normal distribution, to define n:

Var <- rnorm(50 000, 0, 1)

Is there a way I could combine an ifelse command with the above to specify the number of n as well as the probability of Var being 0?

honzajolic · Accepted Answer · 2018-02-22 20:03:17Z

2

I would use rbinom like this:

n_ <- 50000
p_ <- 0.4 # it's probability of 1s

Var <- rbinom(n=n_, size=1, prob=p_)

By using of variables, you can change the size and/or probability just by changing of those variables. Hope that's what you are looking for.

answered Feb 22, 2018 at 20:03

honzajolic

564 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Julius Vainora · Accepted Answer · 2018-02-22 19:58:55Z

1

If by 60% you mean a probability equal to 0.6 (rather than an empirical frequency), then

Var <- sample(0:1, 50000, prob = c(6, 4), replace = TRUE)

gives a desired sequence of independent Bernoulli(0.6) realizations.

answered Feb 22, 2018 at 19:58

Julius Vainora

48.4k9 gold badges95 silver badges108 bronze badges

Comments

ngm · Accepted Answer · 2018-02-22 21:18:41Z

I'm picking nits here, but it actually isn't completely clear exactly what you want.

Do you want to simulate a sample of 50000 from the distribution you describe?

Or, do you want 50000 replications of simulating an observation from the distribution you describe?

These are different things that, in my opinion, should be approached differently.

To simulate a sample of size 50000 from that distribution you would use:

sample(c(0,1), size = 50000, replace = TRUE)

To replicate 50000 simulations of sampling from the distribution you describe I would recommend:

replicate(50000, sample(c(0,1), size = 1, prob = c(0.6, 0.4)))

This might seem silly since these two lines of code produce exactly the same thing, in this case.

But suppose your goal was to investigate properties of samples of size 50000? Then what you would use a bunch (say, 1000) of replication of that first line of code above wrapped inside replicate:

replicate(1000, sample(c(0,1), size = 50000, prob = c(0.6, 0.4), replace = TRUE))

I hope I haven't been too pedantic about this. Having seen simulations go awry it has become my belief that one should keep separate the thing being simulated from the number of simulations you decide to do. The former is fundamental to your problem, while the latter only affects the accuracy of the simulation study and how long it takes.

Collectives™ on Stack Overflow

Creating a binary variable with probability in R

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related