1

a newbie here. I have a dataframe genes which contains two columns: Index and Name such as:

Index Name 1 A 2 B 3 C 4 D

Another dataframe similarity contains 6-7 columns one of which is Members which includes different Index values separated by a space such as:

Members 1 3 5 7 3 7 6 9 2

What I am trying to do is to replace the indices to Names by matching the index column from the dataframe genes. If the index is not found on genes, I simply want to put NA in its position.

So, based the example, my desired output is:

Members A C NA NA C NA NA NA B.

1
  • I apologize. I will try to do that next time. Thanks! Commented Feb 1, 2017 at 21:29

1 Answer 1

1

We may do this with chartr and gsub

df2$Members <- gsub("\\d+", "NA", chartr(paste(df1$Index, collapse=""), 
                   paste(df1$Name, collapse=""), df2$Members))

df2
#    Members
#1 A C NA NA
#2      C NA
#3   NA NA B

Or another approach is to split the 'Members' column and then do the match based on the key/value pair from the first dataset ('df1')

df2$Members <- sapply(strsplit(df2$Members, "\\s+"), function(x) 
                  paste(setNames(df1$Name, df1$Index)[x], collapse=" "))

data

df1 <- structure(list(Index = 1:4, Name = c("A", "B", "C", "D")), .Names = c("Index", 
"Name"), class = "data.frame", row.names = c(NA, -4L))

df2 <-structure(list(Members = c("1 3 5 7", "3 7", "6 9 2")),
  .Names = "Members", class = "data.frame", row.names = c(NA, -3L))
Sign up to request clarification or add additional context in comments.

4 Comments

Hi! Thanks for replying. I tried both your solutions. While using the first one in my full data, I get the following error: Error in chartr(paste(df1$Index, collapse = ""), paste(df1$Name, collapse = ""), : 'old' is longer than 'new' . However, using the second approach I ran to this error: Error in strsplit(df2$Members, "\\s+") : non-character argument . So, I separated the Members column from the dataframe and read it `as.character'. Now it's working. Cam you shed some light regarding why I am getting these errors?
@RasifAjwad Regarding the first error, I was only using your example. If you have used the 'data' I showed in my post, it would work. Regardiing the second error, too, it works on my data because my 'Members' is character class and not factor. Please change the code to strsplit(as.character(df2$Members), "\\s+") The reason is that strsplit dont have an option to take a factor column
Yeah I fixed it already. But still confused about why the first solution isn't working. Will let you know if I am able to fix this. Thanks a lot!
@RasifAjwad Can you try by as.character(df1$Index) in that solution? Do you have only single character strings as you described in the example?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.