0

I have a frequence table for females and males.

t1 <- table(PainF$task_duration)

t2 <- table(PainM$task_duration)

Females

30 40 45 60 65 70 75 78 80 90 95 100 101 120 144 150 180 185 240

3 3 2 5 1 2 1 1 1 5 1 1 1 3 1 1 1 1 2

Males:

2 10 15 20 30 38 40 45 50 55 60 70 72 73 75 80 90 95 100 105 110 120 130

2 2 1 2 3 1 4 4 3 2 11 1 1 1 1 2 10 1 1 1 1 11 2

150 180 200 240 300 500

2 5 1 3 3 1

This is the table the above number is the duration in minutes for a task and the below is the frequence of people, How can i put this data in a scatter plot, is it possible? I want to compare the data for females and males (the frequence) and duration.

I tried to use ggplot.

ggplot() +
  geom_point(data = t1, aes(x = x, y = "column2"), color = "blue", size = 3) +
  geom_point(data = t2, aes(x = x, y = "column2"), color = "red", size = 3) +
  labs(x = "X", y = "Y", title = "Female and Male task duration") +
  theme_minimal()

But i keep getting the following error message.

Error in fortify(): ! data must be a <data.frame>, or an object coercible by fortify(), or a valid <data.frame>-like object coercible by as.data.frame(). Caused by error in .prevalidate_data_frame_like_object(): ! dim(data) must return an of length 2. Run rlang::last_trace() to see where the error occurred.

So my question is can i make a scatterplot based on a frequence table, and if so how can i do that?

> dput(t1)
structure(c(`30` = 3L, `40` = 3L, `45` = 2L, `60` = 5L, `65` = 1L, 
`70` = 2L, `75` = 1L, `78` = 1L, `80` = 1L, `90` = 5L, `95` = 1L, 
`100` = 1L, `101` = 1L, `120` = 3L, `144` = 1L, `150` = 1L, `180` = 1L, 
`185` = 1L, `240` = 2L), dim = 19L, dimnames = list(c("30", "40", 
"45", "60", "65", "70", "75", "78", "80", "90", "95", "100", 
"101", "120", "144", "150", "180", "185", "240")), class = "table")

> dput(t2)
structure(c(`2` = 2L, `10` = 2L, `15` = 1L, `20` = 2L, `30` = 3L, 
`38` = 1L, `40` = 4L, `45` = 4L, `50` = 3L, `55` = 2L, `60` = 11L, 
`70` = 1L, `72` = 1L, `73` = 1L, `75` = 1L, `80` = 2L, `90` = 10L, 
`95` = 1L, `100` = 1L, `105` = 1L, `110` = 1L, `120` = 11L, `130` = 2L, 
`150` = 2L, `180` = 5L, `200` = 1L, `240` = 3L, `300` = 3L, `500` = 1L
), dim = 29L, dimnames = structure(list(c("2", "10", "15", "20", 
"30", "38", "40", "45", "50", "55", "60", "70", "72", "73", "75", 
"80", "90", "95", "100", "105", "110", "120", "130", "150", "180", 
"200", "240", "300", "500")), names = ""), class = "table")
3
  • 1
    Hi Large Simpsons. Please make this a reproducible question by including a sample of the data. You could edit the question to include the outputs of dput(t1) and dput(t2) or a smaller example if these are too large or cannot be shared. Commented Apr 27, 2024 at 18:58
  • @seth I edited so that the outputs can be seen. Hope it makes things clearer. :) Commented Apr 27, 2024 at 22:05
  • 1
    Perhaps something like ...geom_point(data = t1 |> as.data.frame(), aes(x = Var1, y = Freq), color = "blue", size = 3) + ... ? I wasn't able to load your table for some reason, but that works for me with other data I create using table(). Commented Apr 28, 2024 at 4:45

1 Answer 1

0

It might have been easier to use dplyr to create the summary data.

However, starting from the tables that you have, I'd suggest combining them into a data frame like this:

library(dplyr)
library(ggplot2)

counts <- rbind(
  data.frame(t1) %>% mutate(gender = "F"),
  data.frame(t2) %>% mutate(gender = "M")
) %>%
  rename(
    duration = Var1,
    frequency = Freq
  ) %>%
  mutate(
    duration = as.integer(as.character(duration))
  )

The top of the resulting data looks like this:

  duration frequency gender
1        1         3      F
2        2         3      F
3        3         2      F
4        4         5      F
5        5         1      F
6        6         2      F

Now plot. Working with a single data frame here makes things a lot simpler.

ggplot(counts) +
  geom_point(aes(x = duration, y = frequency, col = gender)) +
  labs(x = "X", y = "Y", title = "Female and Male task duration") +
  theme_minimal() +
  scale_colour_manual(values = c("blue", "red"))

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for that, but why is the x-axis showing 0-30 in duration (min) when there is frequency data for 30+ min?
Argh. That's because it's being converted from a factor to integer. Not at a computer right now to test, but please try removing the mutate() at the end of the pipeline.
Other option is to use tibble() instead of data.frame() inside the call to rbind().
replacing as.integer(duration) with readr::parse_number(duration) could work here, though without further adjustments, "30+" would just become 30. Perhaps you have some domain knowledge that would suggest a different value, or you could represent that point differently (e.g. with a segment with an arrow to the right) if that simplification is materially misleading.
I have updated the answer. Just to explain what went wrong before: the first column in the result from rbind() had type factor. Converting this straight to integer resulted in the integer values assigned to the levels in the factor (which are sequential starting from 1). To get the actual factor levels had to first convert to character and then integer. Sorry for the delay in fixing this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.