How to handle with empty dataframes in R?

Question

I noticed that sometimes I get errors in my R scripts when I forget checking whether the dataframe I'm working on is actually empty (has zero rows).

For example, when I used apply like this

apply(X=DF,MARGIN=1,FUN=function(row) !any(vec[ row[["start"]]:row[["end"]] ]))

and DF happened to be empty, I got an error about the subscripts.

Why is that? Aren't empty dataframes valid? Why does apply() with MARGIN=1 even try to do anything when there are no rows in the dataframe? Do I really need to add a condition before each such apply to make sure the dataframe isn't empty?

Thank you!

What you want as a result in case of an empty data.frame? A list? NULL? NA? FALSE? Maybe you could eliminate empty data.frame's earlier in your code? — Marek
– Marek, Commented Sep 7, 2010 at 12:03
@mbq I'm not sure. When I use apply() with MARGIN=1, how exactly does it work? I thought it sends each row to FUN and aggregate the results. — David B
– David B, Commented Sep 7, 2010 at 14:26

Joris Meys · Accepted Answer · 2010-09-08 07:15:27Z

3

On a side note: apply always accesses the function you use at least once. If the input is a dataframe without any rows but with defined variables, it sends "FALSE" as an argument to the function. If the dataframe is completely empty, it sends a logical(0) to the function.

> x <- data.frame(a=numeric(0))
> str(x)
'data.frame':   0 obs. of  1 variable:
 $ a: num 

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
[1] FALSE

> x <- data.frame()

> str(x)
'data.frame':   0 obs. of  0 variables

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
logical(0)

So as Joshua already told you, either control before the apply whether the dataframe has rows, or add a condition in the function within the apply.

EDIT : This means you should take into account that length(x)==0 is not a very good check, you need to check whether either length(x==0) or !x is TRUE if both possibilities could arise : (Code taken from Joshua)

apply(X=data.frame(),MARGIN=1,  # empty data.frame
  FUN=function(row) {
    if(length(row)==0 || !row) {return()}
    !any(vec[ row[["start"]]:row[["end"]] ])
  })

edited Sep 8, 2010 at 7:15

answered Sep 7, 2010 at 14:58

Joris Meys

109k31 gold badges228 silver badges266 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

David B Over a year ago

I think it might be better to use if(length(row)==0 || !row) (|| instead of |), otherwise we might get warnings saying the condition has length > 1 and only the first element will be used

David B Over a year ago

p.s. where is this behavior of apply that you have mentioned documented?

Joris Meys Over a year ago

@David : in the code above. Sometimes testing it out yourself gives you already a lot of insight. I remembered I tried it out a while ago.

Yetanotherjosh Over a year ago

"Apply always accesses the function you use at least once" - thanks for pointing that out. Is that documented somewhere? The behavior really threw me for a loop despite having read the docs for the function. How would I have known it does that if not for this answer? Thanks!

Joris Meys Over a year ago

@Yetanotherjosh I didn't find this exact statement in the help files, but derived if from how the function is supposed to work and testing it out. I learnt a lot of things about R by trying it out.

Joshua Ulrich · Accepted Answer · 2010-09-07 13:53:10Z

3

This has absolutely nothing to do with apply. The function you are applying does not work when the data.frame is empty.

> myFUN <- function(row) !any(vec[ row[["start"]]:row[["end"]] ])
> myFUN(DF[1,])  # non-empty data.frame
[1] FALSE
> myFUN(data.frame()[1,])  # empty data.frame
Error in row[["start"]]:row[["end"]] : argument of length 0

Add a condition to your function.

> apply(X=data.frame(),MARGIN=1,  # empty data.frame
+  FUN=function(row) {
+    if(length(row)==0) return()
+    !any(vec[ row[["start"]]:row[["end"]] ])
+  })
NULL

answered Sep 7, 2010 at 13:53

Joshua Ulrich

177k33 gold badges357 silver badges429 bronze badges

3 Comments

David B Over a year ago

I'm not sure I understand how apply(MARGIN=1) works. I assumed it send each row to FUN and aggregates the results. If that was the case, an empty data frame shouldn't have failed since FUN would never have been called. So I guess this isn't the case. I looked at the documentation but still didn't figure out how it works exactly.

Joshua Ulrich Over a year ago

apply does not aggregate. It puts the results of the calls to FUN on portions ("margins") of X into an object. The resulting object is defined in the first paragraph in the "Value" section of ?apply. I'm not sure why you assumed FUN wouldn't be called if X is empty; the documentation doesn't even hint at that behavior.

Kevin Over a year ago

I know this is an old answer, but what happens when the data frame is not empty but contain values. I believe if you put a data frame with multiple rows into this function it will spit out an error

Marek · Accepted Answer · 2010-09-07 08:30:49Z

1

I don't think it's related to 0-row data.frame:

X <- data.frame(a=numeric(0))
str(X)
# 'data.frame':   0 obs. of  1 variable:
# $ a: num 
apply(X,1,sum)
# integer(0)

Try use traceback() after error to see what exactly cause it.

answered Sep 7, 2010 at 8:30

Marek

51k15 gold badges109 silver badges125 bronze badges

Comments

datanalytics.com · Accepted Answer · 2010-09-07 08:40:05Z

1

I would use mapply instead:

kk <- data.frame( start = integer(0), end = integer(0) )
kkk <- data.frame( start = 1, end = 3 )

vect <- rnorm( 100 ) > 0

with(kk,  mapply( function(x, y) !any( vect[x]:vect[y] ), start, end ) )
with(kkk, mapply( function(x, y) !any( vect[x]:vect[y] ), start, end ) )

answered Sep 7, 2010 at 8:40

datanalytics.com

9967 silver badges11 bronze badges

Collectives™ on Stack Overflow

How to handle with empty dataframes in R?

4 Answers 4

5 Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related