Frequence Scatter plot?

Question

I have a frequence table for females and males.

t1 <- table(PainF$task_duration)

t2 <- table(PainM$task_duration)

Females

30 40 45 60 65 70 75 78 80 90 95 100 101 120 144 150 180 185 240

3 3 2 5 1 2 1 1 1 5 1 1 1 3 1 1 1 1 2

Males:

2 10 15 20 30 38 40 45 50 55 60 70 72 73 75 80 90 95 100 105 110 120 130

2 2 1 2 3 1 4 4 3 2 11 1 1 1 1 2 10 1 1 1 1 11 2

150 180 200 240 300 500

2 5 1 3 3 1

This is the table the above number is the duration in minutes for a task and the below is the frequence of people, How can i put this data in a scatter plot, is it possible? I want to compare the data for females and males (the frequence) and duration.

I tried to use ggplot.

ggplot() +
  geom_point(data = t1, aes(x = x, y = "column2"), color = "blue", size = 3) +
  geom_point(data = t2, aes(x = x, y = "column2"), color = "red", size = 3) +
  labs(x = "X", y = "Y", title = "Female and Male task duration") +
  theme_minimal()

But i keep getting the following error message.

Error in fortify(): ! data must be a <data.frame>, or an object coercible by fortify(), or a valid <data.frame>-like object coercible by as.data.frame(). Caused by error in .prevalidate_data_frame_like_object(): ! dim(data) must return an of length 2. Run rlang::last_trace() to see where the error occurred.

So my question is can i make a scatterplot based on a frequence table, and if so how can i do that?

> dput(t1)
structure(c(`30` = 3L, `40` = 3L, `45` = 2L, `60` = 5L, `65` = 1L, 
`70` = 2L, `75` = 1L, `78` = 1L, `80` = 1L, `90` = 5L, `95` = 1L, 
`100` = 1L, `101` = 1L, `120` = 3L, `144` = 1L, `150` = 1L, `180` = 1L, 
`185` = 1L, `240` = 2L), dim = 19L, dimnames = list(c("30", "40", 
"45", "60", "65", "70", "75", "78", "80", "90", "95", "100", 
"101", "120", "144", "150", "180", "185", "240")), class = "table")

> dput(t2)
structure(c(`2` = 2L, `10` = 2L, `15` = 1L, `20` = 2L, `30` = 3L, 
`38` = 1L, `40` = 4L, `45` = 4L, `50` = 3L, `55` = 2L, `60` = 11L, 
`70` = 1L, `72` = 1L, `73` = 1L, `75` = 1L, `80` = 2L, `90` = 10L, 
`95` = 1L, `100` = 1L, `105` = 1L, `110` = 1L, `120` = 11L, `130` = 2L, 
`150` = 2L, `180` = 5L, `200` = 1L, `240` = 3L, `300` = 3L, `500` = 1L
), dim = 29L, dimnames = structure(list(c("2", "10", "15", "20", 
"30", "38", "40", "45", "50", "55", "60", "70", "72", "73", "75", 
"80", "90", "95", "100", "105", "110", "120", "130", "150", "180", 
"200", "240", "300", "500")), names = ""), class = "table")

Hi Large Simpsons. Please make this a reproducible question by including a sample of the data. You could edit the question to include the outputs of dput(t1) and dput(t2) or a smaller example if these are too large or cannot be shared. — Seth
– Seth, Commented Apr 27, 2024 at 18:58
@seth I edited so that the outputs can be seen. Hope it makes things clearer. :) — Large Simpsons
– Large Simpsons, Commented Apr 27, 2024 at 22:05
Perhaps something like ...geom_point(data = t1 |> as.data.frame(), aes(x = Var1, y = Freq), color = "blue", size = 3) + ... ? I wasn't able to load your table for some reason, but that works for me with other data I create using table(). — Jon Spring
– Jon Spring, Commented Apr 28, 2024 at 4:45

datawookie · Accepted Answer · 2024-04-29 05:02:34Z

0

It might have been easier to use dplyr to create the summary data.

However, starting from the tables that you have, I'd suggest combining them into a data frame like this:

library(dplyr)
library(ggplot2)

counts <- rbind(
  data.frame(t1) %>% mutate(gender = "F"),
  data.frame(t2) %>% mutate(gender = "M")
) %>%
  rename(
    duration = Var1,
    frequency = Freq
  ) %>%
  mutate(
    duration = as.integer(as.character(duration))
  )

The top of the resulting data looks like this:

  duration frequency gender
1        1         3      F
2        2         3      F
3        3         2      F
4        4         5      F
5        5         1      F
6        6         2      F

Now plot. Working with a single data frame here makes things a lot simpler.

ggplot(counts) +
  geom_point(aes(x = duration, y = frequency, col = gender)) +
  labs(x = "X", y = "Y", title = "Female and Male task duration") +
  theme_minimal() +
  scale_colour_manual(values = c("blue", "red"))

edited Apr 29, 2024 at 5:02

answered Apr 28, 2024 at 5:30

datawookie

6,8022 gold badges22 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Large Simpsons Over a year ago

Thanks for that, but why is the x-axis showing 0-30 in duration (min) when there is frequency data for 30+ min?

datawookie Over a year ago

Argh. That's because it's being converted from a factor to integer. Not at a computer right now to test, but please try removing the mutate() at the end of the pipeline.

datawookie Over a year ago

Other option is to use tibble() instead of data.frame() inside the call to rbind().

Jon Spring Over a year ago

replacing as.integer(duration) with readr::parse_number(duration) could work here, though without further adjustments, "30+" would just become 30. Perhaps you have some domain knowledge that would suggest a different value, or you could represent that point differently (e.g. with a segment with an arrow to the right) if that simplification is materially misleading.

datawookie Over a year ago

I have updated the answer. Just to explain what went wrong before: the first column in the result from rbind() had type factor. Converting this straight to integer resulted in the integer values assigned to the levels in the factor (which are sequential starting from 1). To get the actual factor levels had to first convert to character and then integer. Sorry for the delay in fixing this.

Collectives™ on Stack Overflow

Frequence Scatter plot?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related