5

I noticed that sometimes I get errors in my R scripts when I forget checking whether the dataframe I'm working on is actually empty (has zero rows).

For example, when I used apply like this

apply(X=DF,MARGIN=1,FUN=function(row) !any(vec[ row[["start"]]:row[["end"]] ]))

and DF happened to be empty, I got an error about the subscripts.

Why is that? Aren't empty dataframes valid? Why does apply() with MARGIN=1 even try to do anything when there are no rows in the dataframe? Do I really need to add a condition before each such apply to make sure the dataframe isn't empty?

Thank you!

4
  • Aren't you confusing standard *pplys with plyr ones? Commented Sep 7, 2010 at 11:05
  • What you want as a result in case of an empty data.frame? A list? NULL? NA? FALSE? Maybe you could eliminate empty data.frame's earlier in your code? Commented Sep 7, 2010 at 12:03
  • @mbq I'm not sure. When I use apply() with MARGIN=1, how exactly does it work? I thought it sends each row to FUN and aggregate the results. Commented Sep 7, 2010 at 14:26
  • Sorry, this comment was stupid. Ignore. Commented Sep 7, 2010 at 14:36

4 Answers 4

3

On a side note: apply always accesses the function you use at least once. If the input is a dataframe without any rows but with defined variables, it sends "FALSE" as an argument to the function. If the dataframe is completely empty, it sends a logical(0) to the function.

> x <- data.frame(a=numeric(0))
> str(x)
'data.frame':   0 obs. of  1 variable:
 $ a: num 

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
[1] FALSE

> x <- data.frame()

> str(x)
'data.frame':   0 obs. of  0 variables

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
logical(0)

So as Joshua already told you, either control before the apply whether the dataframe has rows, or add a condition in the function within the apply.

EDIT : This means you should take into account that length(x)==0 is not a very good check, you need to check whether either length(x==0) or !x is TRUE if both possibilities could arise : (Code taken from Joshua)

apply(X=data.frame(),MARGIN=1,  # empty data.frame
  FUN=function(row) {
    if(length(row)==0 || !row) {return()}
    !any(vec[ row[["start"]]:row[["end"]] ])
  })
Sign up to request clarification or add additional context in comments.

5 Comments

I think it might be better to use if(length(row)==0 || !row) (|| instead of |), otherwise we might get warnings saying the condition has length > 1 and only the first element will be used
p.s. where is this behavior of apply that you have mentioned documented?
@David : in the code above. Sometimes testing it out yourself gives you already a lot of insight. I remembered I tried it out a while ago.
"Apply always accesses the function you use at least once" - thanks for pointing that out. Is that documented somewhere? The behavior really threw me for a loop despite having read the docs for the function. How would I have known it does that if not for this answer? Thanks!
@Yetanotherjosh I didn't find this exact statement in the help files, but derived if from how the function is supposed to work and testing it out. I learnt a lot of things about R by trying it out.
3

This has absolutely nothing to do with apply. The function you are applying does not work when the data.frame is empty.

> myFUN <- function(row) !any(vec[ row[["start"]]:row[["end"]] ])
> myFUN(DF[1,])  # non-empty data.frame
[1] FALSE
> myFUN(data.frame()[1,])  # empty data.frame
Error in row[["start"]]:row[["end"]] : argument of length 0

Add a condition to your function.

> apply(X=data.frame(),MARGIN=1,  # empty data.frame
+  FUN=function(row) {
+    if(length(row)==0) return()
+    !any(vec[ row[["start"]]:row[["end"]] ])
+  })
NULL

3 Comments

I'm not sure I understand how apply(MARGIN=1) works. I assumed it send each row to FUN and aggregates the results. If that was the case, an empty data frame shouldn't have failed since FUN would never have been called. So I guess this isn't the case. I looked at the documentation but still didn't figure out how it works exactly.
apply does not aggregate. It puts the results of the calls to FUN on portions ("margins") of X into an object. The resulting object is defined in the first paragraph in the "Value" section of ?apply. I'm not sure why you assumed FUN wouldn't be called if X is empty; the documentation doesn't even hint at that behavior.
I know this is an old answer, but what happens when the data frame is not empty but contain values. I believe if you put a data frame with multiple rows into this function it will spit out an error
1

I don't think it's related to 0-row data.frame:

X <- data.frame(a=numeric(0))
str(X)
# 'data.frame':   0 obs. of  1 variable:
# $ a: num 
apply(X,1,sum)
# integer(0)

Try use traceback() after error to see what exactly cause it.

Comments

1

I would use mapply instead:

kk <- data.frame( start = integer(0), end = integer(0) )
kkk <- data.frame( start = 1, end = 3 )

vect <- rnorm( 100 ) > 0

with(kk,  mapply( function(x, y) !any( vect[x]:vect[y] ), start, end ) )
with(kkk, mapply( function(x, y) !any( vect[x]:vect[y] ), start, end ) )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.