add text values to ggplot on secondary axis

Question

I have to create graph

Following is my sample data frame

data <- data.frame(
  "Tissue" = c("Adrenal gland", "Appendix", "Appendix"),
  "protein.expression" = c("No detect","No detect", "Medium"),
  "cell.type" = c("Glandular cells" ,"Lymphoid tissu","Glandular cells")
)

Left y axis is unique tissue type. Left axis have comma separated cell types.

I am not sure how to get the celltypes corresponding to each tissue (on left y axis) to right axis (in comma separated form)

My code is

p1 <- ggplot(dat %>% filter(facet==1), aes(
    x = tissue, 
    y = factor(protein.expression, levels = unique(protein.expression, decreasing = F), ordered = TRUE), 
    fill = protein.expression, 
    label = cell.type
    )) +
  geom_point(stat = 'identity', aes(col = protein.expression), size = 12) +
  geom_text(size = 6, fontface = "bold", colour = "white") +
  geom_label() +
  # facet_grid(cell.type ~ ., scales = "free", space = "free") +
  scale_fill_manual(values = myPalette, drop = FALSE) +
  scale_color_manual(values = myPalette, drop = FALSE) +
  theme_classic() +
  labs(title = "Protein Atlas") + 
  guides(fill = guide_legend(title = "Protein expression")) +
  ylab("Cell types measured per tissue") +
  # ylim(1,4) +
  coord_flip() +
  theme(axis.text.x = element_text(size = 25, vjust = 0.5, hjust = .9),
        axis.text.y = element_text(size = 25),
        legend.position = "none",
        axis.title.x = element_text(size = 30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))

the cell types are with in the dots. I want them to be on the right side, comma sepearated and if possible color coded by corresponding protein expression label.

olorcain · Accepted Answer · 2019-04-30 20:38:46Z

3

So this is a bit of a hack but it might work for you.

I introduce a third column in the graph to hold the labels as per my original post.
I pre-process your data to try and spread out the labels in this third column around the Tissue variable to that they don't appear all on top of each other.

my pre-processing is pretty ugly but works ok. Note that I only catered for a max of 4 cell.types as per your comment.

It gives me this graph:

My code:

data = data.frame("Tissue"=c("Adrenal gland", "Appendix", "Appendix"), "protein.expression" = c("No detect","No detect", "Medium"), "cell.type" = c("Glandular cells" ,"Lymphoid tissu","Glandular cells"))

# Pre-processing section. 
# Step 1: find out the n of cell.types per tissue type
counters <- data %>% group_by(Tissue) %>% summarise(count = n())

# Step 2: Join n back to original data. Transform protein.expression to ordered factor
data <- data %>%
  inner_join(counters, by="Tissue") %>% 
  mutate(protein = factor(protein.expression, levels=unique(protein.expression, decreasing = F), ordered=TRUE),
         positionTissue = as.numeric(Tissue))

results <- data.frame()

# Step 3: Spread the cell.type labels around the position of the Tissue. 4 scenarios catered for.
for(t in unique(data$Tissue)){
  subData <- filter(data, Tissue == t)
  subData$spreader <- as.numeric(subData$Tissue)

  if(length(unique(subData$cell.type)) == 2){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.1,as.numeric(Tissue)+0.1)) %>%
      select(-x)

    results <- rbind(results, subData)
  } else if(length(unique(subData$cell.type)) == 3){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.15,
                              ifelse(as.numeric(x)==3,as.numeric(Tissue)+0.15,as.numeric(Tissue)))) %>%
      select(-x)

    results <- rbind(results, subData)
  } else if(length(unique(subData$cell.type)) == 4){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.2,
                           ifelse(as.numeric(x)==2,as.numeric(Tissue)-0.1,
                                  ifelse(as.numeric(x)==3,as.numeric(Tissue)+0.1,
                                         ifelse(as.numeric(x)==4,as.numeric(Tissue)+0.2,as.numeric(Tissue)))))) %>%
      select(-x)

    results <- rbind(results, subData)
  } else{
    results <- rbind(results, subData)
  }
}

# Plot the data based on the new label position "spreader" variable
ggplot(results, aes(x = positionTissue, y = protein, label=cell.type)) +
  geom_point(stat='identity', aes(col=protein.expression), size=12)  +
  geom_text(aes(y=0.5,label=Tissue), size=8, fontface="bold", angle=90)+
  geom_label(aes(y="zzz", x=spreader, fill=protein), colour="white") +
  theme_classic() +
  scale_x_continuous(limits = c(min(as.numeric(data$Tissue))-0.5,max(as.numeric(data$Tissue))+0.5))+
  scale_y_discrete(breaks=c("Medium","No detect")) +
  labs(title="Protein Atlas") + 
  guides(fill=guide_legend(title="Protein expression"))+
  ylab("Cell types measured per tissue") +
  xlab("") +
  #ylim(1,4) +
  coord_flip()+
  theme(axis.text.x = element_text(size = 25),
        axis.text.y = element_text(colour = NA),
        legend.position = "none",
        axis.title.x = element_text(size=30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))

Edit #2:

Update to retain label colours by creating n positions where n is the number of cell.types:

data = data %>% 
  mutate(position = paste("z",cell.type))

Then you can use this new position variable instead of the static "zzz" I suggested in my original post. Your labels will have the correct colours, but your chart will look odd if there are a lot of cell.types.

  geom_label(aes(y=position, label = cell.type)) +

EDIT #1: Update to avoid overlapping labels by grouping cell.types to a single label per tissue.

Creating a new label field that concatenates the individual labels for each tissue type:

data = data %>% 
  group_by(Tissue) %>%
  mutate(label = paste(cell.type, collapse = "; "))

And amend the ggplot call to use this new field instead of the existing cell.type field:

  geom_text(aes(y="zzz", label = label), size = 6, fontface = "bold", colour = "white")+

or:

  geom_label(aes(y="zzz", label = label),) +

ORIGINAL POST: You could plot your labels at a third position (e.g. "zzz") and then hide that position from the set of axis labels using scale_x_discrete(breaks=c()).

ggplot(data, aes(x = Tissue, y = factor(protein.expression,
                                    levels=unique(protein.expression, 
                                                  decreasing = F),
                                    ordered=TRUE), fill = protein.expression, 
             label = cell.type))+
  geom_point(stat='identity', aes(col=protein.expression), size=12)  +
  geom_text(aes(y="zzz"), size = 6, fontface = "bold", colour = "white")+
  geom_label(aes(y="zzz"),) +
  # facet_grid(cell.type ~ ., scales = "free", space = "free") +
  # scale_fill_manual(values = myPalette, drop = FALSE) +
  # scale_color_manual(values = myPalette, drop = FALSE) +
  theme_classic() +
  scale_y_discrete(breaks=c("Medium","No detect"))+
  labs(title="Protein Atlas") + 
  guides(fill=guide_legend(title="Protein expression"))+
  ylab("Cell types measured per tissue") +
  #ylim(1,4) +
  coord_flip()+
  theme(axis.text.x = element_text(size = 25, vjust = 0.5, hjust = .9),
        axis.text.y = element_text(size = 25),
        legend.position = "none",
        axis.title.x = element_text(size=30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))

edited Apr 30, 2019 at 20:38

answered Apr 30, 2019 at 14:35

olorcain

1,2481 gold badge11 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user1631306 Over a year ago

this could be promising, though the texts are overlapping on the the zzz column. Any idea how to show them in comma separated string?

olorcain Over a year ago

Sorry - I didn't cop that. There is a way to concatenate the cell.types into a new label field. I'll updaet my post now.

user1631306 Over a year ago

that doest work. Now I have to figure out how to get rid of the lable background color and put the text color for each cell correspoding to its protein expression. Any idea?

olorcain Over a year ago

It doesn't work? What's the issue? Re the label: You could drop the label altogether since you are already adding the geom_text (just change the colour of the geom_text away from white or it's invisible against the white background). Re the colours: I have to say that I would struggle to think of a way of changing colours within the annotation piece of the chart.

olorcain Over a year ago

Ah - ok. THere is another option, but your chart is going to get very stretched if there are a lot of cell.types. I'll update my answer so you can have a look.

|

Collectives™ on Stack Overflow

add text values to ggplot on secondary axis

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related