Using nested apply functions instead of nested for loops

Question

My objective here was to iterate across each column in a df and then for each column iterate down each row and perform a function. The specific function in this case replaces the NA values with the corresponding value in the final column, but the details of the function required are not relevant to the question here. I got the results I needed using two nested for loops like this:

for (j in 1:ncol(df.i)) {
  for (i in 1:nrow(df.i)) {
    df.i[i,j] <- ifelse(is.na(df.i[i,j]), df.i[i,39], df.i[i,j])
  }
}

However, I believe this should be possible using an apply(df.i, 1, function) nested within an apply(df.i, 2, function) But I'm not totally sure that is possible or how to do it. Does anyone know how to achieve the same thing with a nested use of the apply function?

ifelse is a vectorized function, so your inner loop can be replaced with: df.i[,j] <- ifelse(is.na(df.i[,j]), df.i[,39], df.i[,j]). This can now be used in your apply function. — Dave2e
– Dave2e, Commented Aug 1, 2018 at 15:35
Beware when using apply() with data.frames. apply() coerces the data.frame to matrix where all columns are of the same data type. This seems not to be an issue in your particular case but in general it is safer to use lapply(). — Uwe
– Uwe, Commented Aug 1, 2018 at 15:40

Rui Barradas · Accepted Answer · 2018-08-01 17:28:42Z

2

Here are four ways to do what the inner instruction does.

First, a dataset example.

set.seed(5345)    # Make the results reproducible
df.i <- matrix(1:400, ncol = 40)
is.na(df.i) <- sample(400, 50)

Now, the comment by @Dave2e: just one for loop, vectorize the inner most one.

df.i2 <- df.i3 <- df.i1 <- df.i    # Work with copies

for (j in 1:ncol(df.i1)) {
  df.i1[,j] <- ifelse(is.na(df.i1[, j]), df.i1[, 39], df.i1[, j])
}

Then, vectorized, no loops at all.

df.i2 <- ifelse(is.na(df.i), df.i[, 39], df.i)

Another vectorized, by @Gregor in the comment, much better since ifelse is known to be relatively slow.

df.i3[is.na(df.i3)] <- df.i3[row(df.i3)[is.na(df.i3)], 39]

And your solution, as posted in the question.

for (j in 1:ncol(df.i)) {
  for (i in 1:nrow(df.i)) {
    df.i[i,j] <- ifelse(is.na(df.i[i,j]), df.i[i,39], df.i[i,j])
  }
}

Compare the results.

identical(df.i, df.i1)
#[1] TRUE

identical(df.i, df.i2)
#[1] TRUE

identical(df.i, df.i3)
#[1] TRUE

Benchmarks.

After the comment by @Gregor I have decided to benchmark the 4 solutions. As expected each optimization gives a significant seep up and his fully vectorized solution is the fastest.

f <- function(df.i){
  for (j in 1:ncol(df.i)) {
    for (i in 1:nrow(df.i)) {
      df.i[i,j] <- ifelse(is.na(df.i[i,j]), df.i[i,39], df.i[i,j])
    }
  }
  df.i
}

f1 <- function(df.i1){
  for (j in 1:ncol(df.i1)) {
    df.i1[,j] <- ifelse(is.na(df.i1[, j]), df.i1[, 39], df.i1[, j])
  }
  df.i1
}

f2 <- function(df.i2){
  df.i2 <- ifelse(is.na(df.i2), df.i2[, 39], df.i2)
  df.i2
}

f3 <- function(df.i3){
  df.i3[is.na(df.i3)] <- df.i3[row(df.i3)[is.na(df.i3)], 39]
  df.i3
}

microbenchmark::microbenchmark(
  two_loops = f(df.i),
  one_loop = f1(df.i1),
  ifelse = f2(df.i2),
  vectorized = f3(df.i3)
)
#Unit: microseconds
#      expr      min        lq       mean    median       uq      max neval
# two_loops 1125.017 1143.4995 1226.93089 1152.5665 1190.599 5209.431   100
#  one_loop  492.945  500.7045  518.73060  504.9435  516.638  678.951   100
#    ifelse   42.269   45.7770   50.55519   48.4140   50.470  198.533   100
#vectorized   12.626   14.5520   16.21975   15.6380   17.663   27.525   100

edited Aug 1, 2018 at 17:28

answered Aug 1, 2018 at 15:45

Rui Barradas

77.9k8 gold badges41 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Gregor Thomas Over a year ago

Fully vectorized, no ifelse: df.i3[is.na(df.i3)] = df.i3[row(df.i3)[is.na(df.i3)], 39]

Gregor Thomas Over a year ago

(And for loop, no ifelse, replace the inner line with df.i1[is.na(df.i1[, j]), j] <- df.i1[is.na(df.i1[, j]), 39])

Rui Barradas Over a year ago

@Gregor Thanks, I have used the first comment, see the edit.

Collectives™ on Stack Overflow

Using nested apply functions instead of nested for loops

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related