0

I need some help for a technical manipulation on R plz.

My problem : I have some observation data of bird by presence/absence in differents habitat types. I want know the sucess ratio of observation in these differents habitats according to their surface range :

data_observation <- data.frame(
  habitat_bush = c(
    0, 0, 0, 0, 10,
    10, 30, 30, 30, 45,
    65, 65, 65, 80, 80,
    80, 90, 95, 100
  ),
  obs = c(
    "yes", "no", "no", "no", "yes",
    "no", "no", "yes", "no", "yes",
    "yes", "no", "yes", "no", "yes",
    "yes", "yes", "yes", "yes"
  )
)

Here you have just data for 'habitat_bush" but in have 10 more time habitats.

Help by a colleague, we have made this function to make a ggplot of the ratio success of observation under differents area size of 'habitat_bush" :

library(dplyr)
library(ggplot2)
library(scales)


plot_forest_test <- function(data = NULL, habitat_type = NULL, colour = NULL) {
  x <- enquo(habitat_type)
  fill <- enquo(colour)

  ggdata <- data %>%
    select(x = !!x, fill = !!fill) %>%
    mutate(
      group = case_when(
        x == 0 ~ "[0]",
        x > 0.0001 & x < 10.0001 ~ "]0-10]",
        x > 10.0001 & x < 25.0001 ~ "]10-25]",
        x > 25.0001 & x < 50.0001 ~ "]25-50]",
        x > 50.0001 & x < 75.0001 ~ "]50-75]",
        x > 75.0001 ~ "]75- 100]"
      )
    ) %>%
    select(-x) %>%
    group_by(group, fill) %>%
    count() %>%
    group_by(group) %>%
    group_modify(~ mutate(.data = .x, freq = n / sum(n)))

  ggplot(data = ggdata, mapping = aes(x = group, y = freq, fill = fill)) +
    geom_bar(stat = "identity") +
    scale_fill_brewer(palette = "Greens") +
    scale_y_continuous(labels = scales::percent) +
    theme_minimal() +
    labs(x = expr(!!x), fill = expr(!!fill))
}

plot_forest_test(data = data_observation, habitat_type = habitat_bush, colour = obs)

It's work very well. But the observation can depend of effort put by technicien to looking for the presence of bird. So, I have data like that :

data_observation_2 <- data.frame(
  superficie_essence = c(
    0, 0, 0, 0, 10,
    10, 30, 30, 30, 45,
    65, 65, 65, 80, 80,
    80, 90, 95, 100
  ),
  obs = c(
    "yes", "no", "no", "no", "yes",
    "no", "no", "yes", "no", "yes",
    "yes", "no", "yes", "no", "yes",
    "yes", "yes", "yes", "yes"
  ),
  effort = c(low, low, mid-low, mid-low, low, mid-low, mid-low,
            mid-high, mid-high, high, mid-low, mid-low, mid-high, mid-low, mid-high, high, high, mid-high, high)
)

My R skills stop here. I want have the same previously graph but subdivided by effort_type for each modalities of habitats types, in the same graphical (like multipanel graphical). In other word I want 5 sub-graph of previous graph with 1 barplot by efforts modalities. But I have lot of data, so I would like put this processu into a function like :

plot_forest_test_2(data = data_observation, habitat_type = habitat_bush, effort = Q_effort, colour = obs)

Can you help me please ? Thanks for your help !

cdlt

1 Answer 1

1

Quosures are not my forte, especially when they might be missing but give this a shot. I created a new column for the facetted item and then add facet_wrap(). You could also use facet_grid(). Hope it helps.

plot_forest_test <- function(data = NULL, habitat_type = NULL, colour = NULL, facet = NULL) {
  x <- enquo(habitat_type)
  fill <- enquo(colour)

  # this is new ####################
  facet <- enquo(facet)
  has_facet <- quo_name(facet) != "NULL"

  df <- 
    data %>% 
    mutate(
      x = !!x, 
      fill = !!fill,
      facet = ""
    )

  if (has_facet) {
    df <- 
      df %>% 
      mutate(facet = !!facet)
  }
  ##################################

  ggdata <- 
    df %>%
    mutate(
      group = case_when(
        x == 0 ~ "[0]",
        x > 0.0001 & x < 10.0001 ~ "]0-10]",
        x > 10.0001 & x < 25.0001 ~ "]10-25]",
        x > 25.0001 & x < 50.0001 ~ "]25-50]",
        x > 50.0001 & x < 75.0001 ~ "]50-75]",
        x > 75.0001 ~ "]75- 100]"
      )
    ) %>%
    select(-x) %>%
    # adding facet here
    group_by(group, fill, facet) %>% 
    count() %>%
    group_by(group, facet) %>%
    arrange(desc(fill)) %>% 
    mutate(
      freq = n/sum(n),
      # these steps set up the label placement
      running_freq = cumsum(freq),
      prev_freq = lag(running_freq, default = 0),
      label_y = (prev_freq + running_freq)/2 ,
      label_n = paste0("n = ", sum(n))
    ) %>% 
    ungroup()

  # create plot w/o facet
  p <-
    ggplot(data = ggdata, mapping = aes(x = group, y = freq, fill = fill)) +
    geom_bar(stat = "identity") +
    geom_hline(yintercept = 0) +
    geom_text(aes(y = -0.05, label = label_n), size = 3.5) +
    #geom_text(aes(y = label_y, label = n)) +
    scale_fill_brewer(palette = "Greens") +
    scale_y_continuous(labels = scales::percent) +
    theme(
      panel.background = element_rect(fill = "white"),
      panel.border = element_rect(color = "grey90", fill = NA)
    ) +
    labs(x = expr(!!x), fill = expr(!!fill))

  # add in if facet was mentioned
  if (has_facet) {
    p <-
      p +
      facet_grid(~facet)
  }

  # return final plot
  p
}

I am including an edit to data_observation_2 as the strings were not in quotes and some of the values has spaces around the hyphens when others did not. I made them all consistent w/o spaces

data_observation_2 <- data.frame(
  superficie_essence = c(
    0, 0, 0, 0, 10,
    10, 30, 30, 30, 45,
    65, 65, 65, 80, 80,
    80, 90, 95, 100
  ),
  obs = c(
    "yes", "no", "no", "no", "yes",
    "no", "no", "yes", "no", "yes",
    "yes", "no", "yes", "no", "yes",
    "yes", "yes", "yes", "yes"
  ),
  effort = c(
    "low", "low", "mid-low", "mid-low", "low", "mid-low", "mid-low",
    "mid-high", "mid-high", "high", "mid-low", "mid-low", 
    "mid-high", "mid-low", "mid-high", "high", "high", "mid-high", "high"
  )
  )
)

And the final outcome. I used fct_relevel() to put them in order of effort.

plot_forest_test(
  data = data_observation, 
  habitat_type = habitat_bush, 
  colour = obs
)

data_observation_2 %>% 
  mutate(effort = fct_relevel(effort, "low", "mid-low", "mid-high", "high")) %>% 
  plot_forest_test(
    habitat_type = superficie_essence, 
    colour = obs, 
    facet = effort
  )

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

It's perfect thx ! Can I ask another question, please ? Is it possible to display on the graph the number "n" with which the frequencies have been calculated ?
Sure. Are the barcharts supposed to fill 0-100 for each category or should they look like the chart I submitted above (total bar heights all different)?
I made updates to add data labels and put them in order of effort. If the solution works for you, please mark it as answered.
Thank you for your help. I'm not sure I understood what you're telling me. I calculated frequencies on an "n" observation. However when I have 60% failure / 40% success on a surface modality and in a modality of effort, however, it misses the information on the number of observations of the calculation of the frequency (if it is a frequency calculated on 2 observations or 100 or 1000). So I wondered if we could show this information on the graph with for example, on each base of the barplots "n = 145" or "n = 175" etc. Thanks again !
I updated the code with my go-to method, I imagine there are a few ways people might do this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.