1

I have to create graph

Following is my sample data frame

data <- data.frame(
  "Tissue" = c("Adrenal gland", "Appendix", "Appendix"),
  "protein.expression" = c("No detect","No detect", "Medium"),
  "cell.type" = c("Glandular cells" ,"Lymphoid tissu","Glandular cells")
)

Left y axis is unique tissue type. Left axis have comma separated cell types.

I am not sure how to get the celltypes corresponding to each tissue (on left y axis) to right axis (in comma separated form)

My code is

p1 <- ggplot(dat %>% filter(facet==1), aes(
    x = tissue, 
    y = factor(protein.expression, levels = unique(protein.expression, decreasing = F), ordered = TRUE), 
    fill = protein.expression, 
    label = cell.type
    )) +
  geom_point(stat = 'identity', aes(col = protein.expression), size = 12) +
  geom_text(size = 6, fontface = "bold", colour = "white") +
  geom_label() +
  # facet_grid(cell.type ~ ., scales = "free", space = "free") +
  scale_fill_manual(values = myPalette, drop = FALSE) +
  scale_color_manual(values = myPalette, drop = FALSE) +
  theme_classic() +
  labs(title = "Protein Atlas") + 
  guides(fill = guide_legend(title = "Protein expression")) +
  ylab("Cell types measured per tissue") +
  # ylim(1,4) +
  coord_flip() +
  theme(axis.text.x = element_text(size = 25, vjust = 0.5, hjust = .9),
        axis.text.y = element_text(size = 25),
        legend.position = "none",
        axis.title.x = element_text(size = 30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))

the cell types are with in the dots. I want them to be on the right side, comma sepearated and if possible color coded by corresponding protein expression label.

1 Answer 1

3

So this is a bit of a hack but it might work for you.

  1. I introduce a third column in the graph to hold the labels as per my original post.

  2. I pre-process your data to try and spread out the labels in this third column around the Tissue variable to that they don't appear all on top of each other.

my pre-processing is pretty ugly but works ok. Note that I only catered for a max of 4 cell.types as per your comment.

It gives me this graph: enter image description here

My code:

data = data.frame("Tissue"=c("Adrenal gland", "Appendix", "Appendix"), "protein.expression" = c("No detect","No detect", "Medium"), "cell.type" = c("Glandular cells" ,"Lymphoid tissu","Glandular cells"))

# Pre-processing section. 
# Step 1: find out the n of cell.types per tissue type
counters <- data %>% group_by(Tissue) %>% summarise(count = n())

# Step 2: Join n back to original data. Transform protein.expression to ordered factor
data <- data %>%
  inner_join(counters, by="Tissue") %>% 
  mutate(protein = factor(protein.expression, levels=unique(protein.expression, decreasing = F), ordered=TRUE),
         positionTissue = as.numeric(Tissue))

results <- data.frame()

# Step 3: Spread the cell.type labels around the position of the Tissue. 4 scenarios catered for.
for(t in unique(data$Tissue)){
  subData <- filter(data, Tissue == t)
  subData$spreader <- as.numeric(subData$Tissue)

  if(length(unique(subData$cell.type)) == 2){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.1,as.numeric(Tissue)+0.1)) %>%
      select(-x)

    results <- rbind(results, subData)
  } else if(length(unique(subData$cell.type)) == 3){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.15,
                              ifelse(as.numeric(x)==3,as.numeric(Tissue)+0.15,as.numeric(Tissue)))) %>%
      select(-x)

    results <- rbind(results, subData)
  } else if(length(unique(subData$cell.type)) == 4){
    subData <- subData %>%
      mutate(x=factor(cell.type, levels=unique(cell.type, decreasing = F),ordered=TRUE),
             spreader = ifelse(as.numeric(x)==1,as.numeric(Tissue)-0.2,
                           ifelse(as.numeric(x)==2,as.numeric(Tissue)-0.1,
                                  ifelse(as.numeric(x)==3,as.numeric(Tissue)+0.1,
                                         ifelse(as.numeric(x)==4,as.numeric(Tissue)+0.2,as.numeric(Tissue)))))) %>%
      select(-x)

    results <- rbind(results, subData)
  } else{
    results <- rbind(results, subData)
  }
}

# Plot the data based on the new label position "spreader" variable
ggplot(results, aes(x = positionTissue, y = protein, label=cell.type)) +
  geom_point(stat='identity', aes(col=protein.expression), size=12)  +
  geom_text(aes(y=0.5,label=Tissue), size=8, fontface="bold", angle=90)+
  geom_label(aes(y="zzz", x=spreader, fill=protein), colour="white") +
  theme_classic() +
  scale_x_continuous(limits = c(min(as.numeric(data$Tissue))-0.5,max(as.numeric(data$Tissue))+0.5))+
  scale_y_discrete(breaks=c("Medium","No detect")) +
  labs(title="Protein Atlas") + 
  guides(fill=guide_legend(title="Protein expression"))+
  ylab("Cell types measured per tissue") +
  xlab("") +
  #ylim(1,4) +
  coord_flip()+
  theme(axis.text.x = element_text(size = 25),
        axis.text.y = element_text(colour = NA),
        legend.position = "none",
        axis.title.x = element_text(size=30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))

Edit #2:

Update to retain label colours by creating n positions where n is the number of cell.types:

data = data %>% 
  mutate(position = paste("z",cell.type))

Then you can use this new position variable instead of the static "zzz" I suggested in my original post. Your labels will have the correct colours, but your chart will look odd if there are a lot of cell.types.

  geom_label(aes(y=position, label = cell.type)) +

EDIT #1: Update to avoid overlapping labels by grouping cell.types to a single label per tissue.

Creating a new label field that concatenates the individual labels for each tissue type:

data = data %>% 
  group_by(Tissue) %>%
  mutate(label = paste(cell.type, collapse = "; "))

And amend the ggplot call to use this new field instead of the existing cell.type field:

  geom_text(aes(y="zzz", label = label), size = 6, fontface = "bold", colour = "white")+

or:

  geom_label(aes(y="zzz", label = label),) +

ORIGINAL POST: You could plot your labels at a third position (e.g. "zzz") and then hide that position from the set of axis labels using scale_x_discrete(breaks=c()).

ggplot(data, aes(x = Tissue, y = factor(protein.expression,
                                    levels=unique(protein.expression, 
                                                  decreasing = F),
                                    ordered=TRUE), fill = protein.expression, 
             label = cell.type))+
  geom_point(stat='identity', aes(col=protein.expression), size=12)  +
  geom_text(aes(y="zzz"), size = 6, fontface = "bold", colour = "white")+
  geom_label(aes(y="zzz"),) +
  # facet_grid(cell.type ~ ., scales = "free", space = "free") +
  # scale_fill_manual(values = myPalette, drop = FALSE) +
  # scale_color_manual(values = myPalette, drop = FALSE) +
  theme_classic() +
  scale_y_discrete(breaks=c("Medium","No detect"))+
  labs(title="Protein Atlas") + 
  guides(fill=guide_legend(title="Protein expression"))+
  ylab("Cell types measured per tissue") +
  #ylim(1,4) +
  coord_flip()+
  theme(axis.text.x = element_text(size = 25, vjust = 0.5, hjust = .9),
        axis.text.y = element_text(size = 25),
        legend.position = "none",
        axis.title.x = element_text(size=30),
        axis.title.y = element_text(size = 30, margin = margin(t = 0, r = 20, b = 0, l = 0)),
        legend.title = element_text(size = 30),
        legend.text = element_text(size = 25),
        legend.key.size = unit(2, 'cm'),
        axis.ticks.length=unit(.01, "cm"),
        strip.text.y = element_text(angle = 0))
Sign up to request clarification or add additional context in comments.

9 Comments

this could be promising, though the texts are overlapping on the the zzz column. Any idea how to show them in comma separated string?
Sorry - I didn't cop that. There is a way to concatenate the cell.types into a new label field. I'll updaet my post now.
that doest work. Now I have to figure out how to get rid of the lable background color and put the text color for each cell correspoding to its protein expression. Any idea?
It doesn't work? What's the issue? Re the label: You could drop the label altogether since you are already adding the geom_text (just change the colour of the geom_text away from white or it's invisible against the white background). Re the colours: I have to say that I would struggle to think of a way of changing colours within the annotation piece of the chart.
Ah - ok. THere is another option, but your chart is going to get very stretched if there are a lot of cell.types. I'll update my answer so you can have a look.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.