0

I would like to create dot-plot for my data set. I know how to create a normal dot-plot for treatment comparisons or similar data sets using ggplot. I have the following data. And would like to create a dot-plot with three different colors. Please suggest me how to prepare data for this dot-plot. If I have a single data point in NP and P, it is easy to plot as I already worked with similar data but not getting any idea with this kind of data. I can use ggplot module from R and can be done.

The variable W has always single data point while NP and P has different data points i.e. some time one in NP and some times three and same with variable P,as I shown in the table.

Here is the screen shot for my data.Sample data separated by tab

something similar to this image Sorry for my language

I agree my data is mess. I googled and did some coding to get the plot. I used tidyverse and dplyr packages to attain the plot but again there is a problem with y-axis. Y-axis is very clumsy. I used this following code

d <- read.table("Data1.txt", header = TRUE, sep = "\t", stringsAsFactors = NA)
df <- data.frame(d)

df <- df %>%
 mutate(across(everything(), as.character)) %>%
 pivot_longer(!ID, names_to="colid", values_to="val") %>%
 separate_rows(val, sep="\t", convert=TRUE) %>%
 mutate(ID=as_factor(ID)

Then I plot the graph with ggplot

ggplot(df, aes(x=ID, y=val, color=colid))+geom_point(size=1.5) +theme(axis.text.x = element_text(angle = 90))

The output is this. I tried to adjust Y-axis with ylim and scale_y_discrete() but nothing worked. Please suggest a way to rectify it.

output

1
  • It sounds like you need to restructure your data. In a 2D scatter plot, each point needs to have exactly 1 X and 1 Y value. If you can share your data using dput(data) we might be able to help more. Commented Mar 15, 2021 at 14:37

1 Answer 1

1

This contains many necessary steps for data cleaning, as suggested by user Dan Adams in the comment. This was kind of fun, and it helped me procrastinate my own thesis.

I am using a function from a very famous thread which offers a way to splits columns when the number of resulting columns is unknown.

P.S. The way you shared the data was less than ideal.

#your data is unreadable without this awesome package
# devtools::install_github("alistaire47/read.so") 
library(tidyverse)
df <- read.so::read_md("|ID| |W| |NP| |P|

|:-:| |:-:| |:-:| |:-:|

|1| |4.161| |1.3,1.5| |1.5,2.8|

|2| |0.891| |1.33,1.8,1.79| |1.6|

|3| |7.91| |4.3| |0.899,1.43,0.128|

|40| |2.1| |1.4,0.99,7.9,0.32| |0.6,0.5,1.57|") %>%select(-starts_with("x")) 
#> Warning: Missing column names filled in: 'X2' [2], 'X4' [4], 'X6' [6]

# from this thread https://stackoverflow.com/a/47060452/7941188
split_into_multiple <- function(column, pattern = ", ", into_prefix){
  cols <- str_split_fixed(column, pattern, n = Inf)
  cols[which(cols == "")] <- NA
  cols <- as.tibble(cols)
  m <- dim(cols)[2]
  names(cols) <- paste(into_prefix, 1:m, sep = "_")
  cols
}
# apply this over the columns of interest
ls_cols <- lapply(c("NP", "P"), function(x) split_into_multiple(df$NP, pattern = ",", x))

# bind it to the single columns of the old data frame
# convert character columns to numeric
# apply pivot longer twice (there might be more direct options, but I won't be 
# bothered to do too much here)
df_new <- 
  bind_cols(df[c("ID", "W")], ls_cols) %>%
  pivot_longer(cols = c(-ID,-W), names_sep = "_", names_to = c(".value", "value")) %>%
  mutate(across(c(P, NP), as.numeric)) %>%
  select(-value) %>%
  pivot_longer(W:P, names_to = c("var"), values_to =  "value")

# The new tidy data can easily be plotted 
ggplot(df_new, aes(ID, value, color = var)) + 
  geom_point()
#> Warning: Removed 12 rows containing missing values (geom_point).

Sign up to request clarification or add additional context in comments.

2 Comments

I am sorry for the confusion with my data, here I will tag a screen shot of my data. I was Unable to input the tabular data properly. Sorry for mess
@ThulasiR actually the markdown table was better for sharing - a screenshot is really the worst way to share data. check stackoverflow.com/help/minimal-reproducible-example

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.