0

I am attempting to create a scatterplot with ggplot, using multiple fields. I have read about these scatterplots, and coloring for a field, but was wondering how I would do this for the ggplot2movies dataset? I wanted to color based on the genre, but these genres are all split up:

> movies <- ggplot2movies::movies
> head(movies)
            title  year length budget rating votes    r1    r2    r3    r4    r5    r6    r7    r8    r9   r10  mpaa Action Animation Comedy Drama Documentary Romance Short
                     <chr> <int>  <dbl>  <int>  <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>  <int>     <int>  <int> <int>       <int>   <int> <int>
1                        $  1971    121     NA    6.4   348   4.5   4.5   4.5   4.5  14.5  24.5  24.5  14.5   4.5   4.5            0         0      1     1           0       0     0
2        $1000 a Touchdown  1939     71     NA    6.0    20   0.0  14.5   4.5  24.5  14.5  14.5  14.5   4.5   4.5  14.5            0         0      1     0           0       0     0
3   $21 a Day Once a Month  1941      7     NA    8.2     5   0.0   0.0   0.0   0.0   0.0  24.5   0.0  44.5  24.5  24.5            0         1      0     0           0       0     1
4                  $40,000  1996     70     NA    8.2     6  14.5   0.0   0.0   0.0   0.0   0.0   0.0   0.0  34.5  45.5            0         0      1     0           0       0     0
5 $50,000 Climax Show, The  1975     71     NA    3.4    17  24.5   4.5   0.0  14.5  14.5   4.5   0.0   0.0   0.0  24.5            0         0      0     0           0       0     0
6                    $pent  2000     91     NA    4.3    45   4.5   4.5   4.5  14.5  14.5  14.5   4.5   4.5  14.5  14.5            0         0      0     1           0       0     0

What is the best way to approach this (color based on genre)? All help is really appreciated!

1
  • 2
    I guess you're going to have to tidy up the data (wide to long format). Perhaps with tidyr::gather(). Commented Dec 4, 2016 at 13:20

1 Answer 1

3

As @hrbrmstr states, you need to transform the data from wide to long. You can use tidyr::gather() in conjunction with dplyr::filter() to achieve this. This chain:

  1. gathers the names and values from Action to Short into the columns genre and flag. This moves the many columns (wide) into a key-value pair (long).
  2. Uses filter to remove the superfluous values for genre (those where the flag == 0).
  3. Stores the resultant data frame in plot_data

The remaining code is a simple ggplot2 scatterplot of length vs rating.

library(dplyr)
library(tidyr)
library(ggplot2)
library(ggplot2movies)

plot_data <- movies %>% 
  gather(genre, flag, Action:Short) %>% 
  filter(flag != 0)

ggplot(plot_data, aes(x = rating, y = length)) +
  geom_point(aes(color = genre), alpha = 0.4)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Very helpful, and just what I was looking for! Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.