2

I have a dataset with microRNAs in 8 different groups. I need to transform this data frame into a binary matrix using R. The number of microRNAs is different in the groups and I would like to make the groups in the row and have the microRNAs on the columns. Here is part of the data:

Group1    Group2    Group3   Group4
miR-133a  miR-133b  miR-456  miR777
miR-777   miR138    miR-564  miR-878
miR-878             miR-777  miR978
                    miR-878
                    miR-978

Output expected:

Groups  miR-133a  miR-133b  miR-456  miR-777.....
Group1  1             0      0        1
Group2  0             1      0        0
.
.
.

I tried to use this code:

im <- which(arr.ind=T,Dat!='');
u <- unique(Dat[im[order(im[,'row'],im[,'col']),]]);
res <- matrix(0L,nrow(Dat),length(u),dimnames=list(NULL,u));
res[cbind(im[,'row'],match(Dat[im],u))] <- 1L;
res

But it is giving me a lot of rows. Can anyone help me with that?

2 Answers 2

3

Here is one option with tidyverse. Reshape to 'long' format, then convert it back to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
df1 %>%
    pivot_longer(cols = everything(), names_to = 'Groups', 
         values_drop_na = TRUE) %>%
    distinct %>%
    mutate(new =1) %>% 
    pivot_wider(names_from =value, values_from = new,  
           values_fill = list(new = 0))
#Groups `miR-133a` `miR-133b` `miR-456` miR777 `miR-777` miR138 `miR-564` `miR-878` miR978 `miR-978`
#  <chr>       <dbl>      <dbl>     <dbl>  <dbl>     <dbl>  <dbl>     <dbl>     <dbl>  <dbl>     <dbl>
#1 Group1          1          0         0      0         1      0         0         1      0         0
#2 Group2          0          1         0      0         0      1         0         0      0         0
#3 Group3          0          0         1      0         1      0         1         1      0         1
#4 Group4          0          0         0      1         0      0         0         1      1         0

Or in base R with table

table(names(df1)[col(df1)], unlist(df1))
#           miR-133a miR-133b miR-456 miR-564 miR-777 miR-878 miR-978 miR138 miR777 miR978
#  Group1        1        0       0       0       1       1       0      0      0      0
#  Group2        0        1       0       0       0       0       0      1      0      0
#  Group3        0        0       1       1       1       1       1      0      0      0
#  Group4        0        0       0       0       0       1       0      0      1      1

NOTE: Here, we assume the blanks as NA. If it is "", first change it to NA and then use the same code

df1[df1 == ""] <- NA

data

df1 <- structure(list(Group1 = c("miR-133a", "miR-777", "miR-878", NA, 
NA), Group2 = c("miR-133b", "miR138", NA, NA, NA), Group3 = c("miR-456", 
"miR-564", "miR-777", "miR-878", "miR-978"), Group4 = c("miR777", 
"miR-878", "miR978", NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))
Sign up to request clarification or add additional context in comments.

Comments

1

Assuming the blanks in your data frame is "" :

df = structure(list(Group1 = c("miR-133a", "miR-777", "miR-878", "", 
""), Group2 = c("miR-133b", "miR138", "", "", ""), Group3 = c("miR-456", 
"miR-564", "miR-777", "miR-878", "miR-978"), Group4 = c("miR777", 
"miR-878", "miR978", "", "")), row.names = c(NA, -5L), class = "data.frame")

Then, make a master set of all items:

alla = setdiff(sort(unique(unlist(df))),"")
res = t(sapply(colnames(df),function(i)as.numeric(alla %in% df[,i])))
colnames(res) = alla

       miR-133a miR-133b miR-456 miR-564 miR-777 miR-878 miR-978 miR138 miR777
Group1        1        0       0       0       1       1       0      0      0
Group2        0        1       0       0       0       0       0      1      0
Group3        0        0       1       1       1       1       1      0      0
Group4        0        0       0       0       0       1       0      0      1
       miR978
Group1      0
Group2      0
Group3      0
Group4      1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.