1

I am trying to create a function to run chi squared where I have to group by several groups. However, while the method works when it's not a function, I am having trouble turning into a function. As I'll be repeating the procedure multiple times, its seems worth doing, but I just can't get the function to recognise the "z" variable and always get the "Unknown or uninitialised column" warning.

Example is below.

library(tidyverse)
library(datasets)

#data
data(iris)
df<-iris%>%
  gather(Type, value, -Species)%>%
  separate(Type, c("type", "attribute"), sep="[.]")

#functions------------
frequency<-function(data, x, y, z){
  x <- enquo(x)
  y <- enquo(y)
  z <- enquo(z)

  data%>%
    filter(!is.na(!!x), !is.na(!!y), !is.na(!!z))%>%
    count(!!x, !!y, !!z)
}

group_chi<-function(data, x, y, z){
  x <- enquo(x)
  y <- enquo(y)

  data %>%
    group_by(!! x) %>%
    nest() %>%
    mutate(M = map(data, function(dat){
      dat2 <- dat %>% spread(!! y, n)
      M <- as.matrix(dat2[, -1])
      row.names(M) <- dat2$'z' #I've done it like this becasue z <- enquo(z) and dat2$!!z doesn't work. jsut having it a z doesnt work either
      return(M)
    }))%>%
    mutate(pvalue = map_dbl(M, ~chisq.test(.x)$p.value)) %>%
    select(-data, -M) %>%
    ungroup()
}

#aplying them--------------------

test<-frequency(df, type, Species, attribute)
chi_test<-group_chi(test,  type, Species, attribute)#brings up warning
#> Warning: Unknown or uninitialised column: 'z'.

#> Warning: Unknown or uninitialised column: 'z'.

#test without the function=no warning. 
No_function<-test %>%
  group_by(type) %>%
  nest() %>%
  mutate(M = map(data, function(dat){
    dat2 <- dat %>% spread(Species, n)
    M <- as.matrix(dat2[, -1])
    row.names(M) <- dat2$attribute
    return(M)
  }))%>%
  mutate(pvalue = map_dbl(M, ~chisq.test(.x)$p.value)) %>%
  select(-data, -M) %>%
  ungroup()


# in the example the results are the same but....the warning message is of concern and the function doesn't output the same in a more compelx dataset.

chi_test 
#> # A tibble: 2 x 2
#>   type  pvalue
#>   <chr>  <dbl>
#> 1 Petal      1
#> 2 Sepal      1
No_function 
#> # A tibble: 2 x 2
#>   type  pvalue
#>   <chr>  <dbl>
#> 1 Petal      1
#> 2 Sepal      1
# what am I doing wrong?

Created on 2020-01-27 by the reprex package (v0.3.0)

What am I doing wrong here?

0

2 Answers 2

1

You can't use $ for an indirect column reference (as in dat2$'z'), instead use dat2[[z]]. When I replace that, there are no warnings/errors.

Try this version of your function instead:

group_chi<-function(data, x, y, z){
  x <- enquo(x)
  y <- enquo(y)

  data %>%
    group_by(!! x) %>%
    nest() %>%
    mutate(M = map(data, function(dat){
      dat2 <- dat %>% spread(!! y, n)
      M <- as.matrix(dat2[, -1])
      row.names(M) <- dat2[[z]]
      return(M)
    }))%>%
    mutate(pvalue = map_dbl(M, ~chisq.test(.x)$p.value)) %>%
    select(-data, -M) %>%
    ungroup()
}

and then call with the string:

chi_test <- group_chi(test,  type, Species, "attribute")

Alternatively, you can first z <- enquo(z) then pull(dat2, !!z) (as in @akrun's answer).

group_chi<-function(data, x, y, z){
  x <- enquo(x)
  y <- enquo(y)
  z <- enquo(z)

  data %>%
    group_by(!! x) %>%
    nest() %>%
    mutate(M = map(data, function(dat){
      dat2 <- dat %>% spread(!! y, n)
      M <- as.matrix(dat2[, -1])
      row.names(M) <- pull(dat2, !!z)
      return(M)
    }))%>%
    mutate(pvalue = map_dbl(M, ~chisq.test(.x)$p.value)) %>%
    select(-data, -M) %>%
    ungroup()
}
group_chi(test,  type, Species, attribute)
# # A tibble: 2 x 2
#   type  pvalue
#   <chr>  <dbl>
# 1 Petal      1
# 2 Sepal      1
Sign up to request clarification or add additional context in comments.

Comments

0

We could also use z <- enquo(z), then make use of the select and pull to extract the column as a vector

group_chi<-function(data, x, y, z){
  x <- enquo(x)
  y <- enquo(y)
  z <- enquo(z)

  data %>%
    group_by(!! x) %>%
    nest() %>%
    mutate(M = map(data, function(dat){
      dat2 <- dat %>% spread(!! y, n)
      M <- as.matrix(dat2[, -1])

      row.names(M) <- dat2 %>% 
                           select(!!z) %>%
                           pull(1)
      return(M)
    }))%>%
    mutate(pvalue = map_dbl(M, ~chisq.test(.x)$p.value)) %>%
    select(-data, -M) %>%
    ungroup()
}

-checking

chi_test <- group_chi(test,  type, Species, attribute)
chi_test
# A tibble: 2 x 2
#  type  pvalue
#  <chr>  <dbl>
#1 Petal      1
#2 Sepal      1

With the newer versions of tidyverse, the curly-curly operator ({{}}) can replace the !!/enquo

group_chi<-function(data, x, y, z){

  data %>%
    group_by({{x}}) %>%
    nest() %>%
    mutate(M = map(data, function(dat){
      dat2 <- dat %>% spread({{y}}, n)
      M <- as.matrix(dat2[, -1])

      row.names(M) <- dat2 %>% 
                           pull({{z}})
      return(M)
    }))%>%
    mutate(pvalue = map_dbl(M, ~chisq.test(.x)$p.value)) %>%
    select(-data, -M) %>%
    ungroup()
}

chi_test <- group_chi(test,  type, Species, attribute)

1 Comment

Thanks! You two are brilliant. It works perfectly now. I decided to use the curly-curly method with pull to cut down on code and so I would NOT have to call the string as in r2evans answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.