1

I have been meaning to rewrite the following code with for loop in R:

x <- sample(1:100, 10)
x[x %% 2 == 0]
#[1]   6  26  72  62  32  86 100

which extracts those elements in vector x that are even number only. I have come up with the following result in the end, but I believe there are more simple ways of coding this.

x <- sample(1:100, 10)
output <- integer(0)
for(i in seq_along(x)) {
  if(x[i] %% 2 == 0) {
    output[i] <- x[i]
    output <- output[!is.na(output)]
  }
}
output
#[1]   6  26  72  62  32  86 100

I would be grateful if you could help me with this.

2
  • 3
    Why would you do that? You are using exactly the same functions and only make it more complicated and less performant. Commented Sep 21, 2020 at 11:57
  • You are absolutely right! since I've only been learning R for a couple of months, I'm trying to come with different ways coding a given problem. Commented Sep 21, 2020 at 12:38

1 Answer 1

1

You can skip the NA removes when adding the new hit to the end of output using length.

output <- integer(0)
for(i in seq_along(x)) {
  if(x[i] %% 2 == 0) {
    output[length(output) + 1] <- x[i]
  }
}
output

Where length(output) gives you the current length of output. By adding 1 you place the new element at the end of output.

Or as @Roland (thanks!) commented using c

output <- integer(0)
for(i in seq_along(x)) {
  if(x[i] %% 2 == 0) {
    output <- c(output, x[i])
  }
}
output

c combines output and x[i] to a new vector.

Or preallocate output with x and mark the non hits with NA

output <- x
for(i in seq_along(x)) {
  if(x[i] %% 2 != 0) {
    output[i] <- NA
  }
}
output <- output[!is.na(output)]
output

Benchmarks:

fun <- alist(Vectorized = x[x %% 2 == 0]
  , Question = {output <- integer(0)
    for(i in seq_along(x)) {
      if(x[i] %% 2 == 0) {output[i] <- x[i]; output <- output[!is.na(output)]}
    }
    output}
  , NaOutside = {output <- integer(0)
    for(i in seq_along(x)) {
      if(x[i] %% 2 == 0) {output[i] <- x[i]}
    }
    output <- output[!is.na(output)]
    output}
  , Append = {output <- integer(0)
    for(i in seq_along(x)) {
      if(x[i] %% 2 == 0) {output[length(output) + 1] <- x[i]}
    }
    output}
  , Append2 = {output <- integer(0); j <- 1
    for(i in seq_along(x)) {
      if(x[i] %% 2 == 0) {output[j] <- x[i]; j <- j + 1}
    }
  , C = {output <- integer(0)
    for(i in seq_along(x)) {
      if(x[i] %% 2 == 0) {
        output <- c(output, x[i])
      }
    }
    output}
  , Preallocate = {output <- x
    for(i in seq_along(x)) {
      if(x[i] %% 2 != 0) {
        output[i] <- NA
      }
    }
    output <- output[!is.na(output)]
    output}
    )
library(microbenchmark)

set.seed(42)
x <- sample(1:100, 10)
microbenchmark(list = fun, control=list(order="block"))
#Unit: nanoseconds
#        expr     min      lq       mean    median      uq     max neval
#  Vectorized     644     655     781.54     666.5     721    8559   100
#    Question 4377648 4419010 4682029.95 4492628.5 4579952 7512162   100
#   NaOutside 3443488 3558401 3751542.62 3597273.0 3723662 5146015   100
#      Append 3932586 4070234 4287628.11 4129849.0 4209361 6036142   100
#     Append2 3966245 4094766 4360989.39 4147847.5 4312868 5899000   100
#           C 3464081 3566902 3806531.77 3618758.5 3743058 6528224   100
# Preallocate 3162424 3263220 3435591.92 3290938.0 3374547 4823017   100

set.seed(42)
x <- sample(1:1e5, 1e4)
microbenchmark(list = fun, control=list(order="block"))
#Unit: microseconds
#        expr        min         lq        mean     median          uq        max neval
#  Vectorized    226.224    276.271    277.0027    278.993    284.4515    345.527   100
#    Question 125550.120 126392.848 129287.7202 126812.309 128426.3655 157958.571   100
#   NaOutside   6911.053   7020.831   7497.4403   7109.891   8158.7580   8779.448   100
#      Append   7843.988   7982.987   8582.6769   8129.988   9287.5760  10775.894   100
#     Append2   7647.340   7783.334   8347.7824   7954.683   9007.4500  10325.973   100
#           C  27976.747  29776.632  29997.5407  30024.121  30250.9590  51630.868   100
# Preallocate   6119.198   6232.228   6679.9407   6367.618   7290.1015   8331.277   100
Sign up to request clarification or add additional context in comments.

5 Comments

Better use output <- c(output, x[i]). Performance shouldn't be worse and it is much more obvious what this does.
However, OP's approach would be better of they put output <- output[!is.na(output)] outside the loop. I'm still baffled that anyone would need an alternative to the vectorized approach.
To be clear: I don't recommend that. I strongly advise against using a loop at all.
Thank you very much indeed. Your solution sounds original! Would you please explain a little bit why you use length in the last code?
Thank you for your Answer dear Roland. Would you please explain how output <- c(output, x[i]) works?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.