1

I have such a dataframe:

KEY C1  C2  C3  C4
A   0   0   1   0
B   0   0   1   0
C   0   1   1   0
D   0   0   1   0
E   1   0   1   0
F   1   0   0   0
G   0   1   0   0
H   0   0   1   0
I   0   1   1   0
J   1   0   0   1

and would like to build this kind of matrix with only two values "1" being in the two variables.

I would not like to count rows where there are more then two values like:

KEY C1  C2  C3  C4
L   1   0   1   1

or less then two:

M   1   0   0  0

Output should be frequency table.

   C1 C2 C3 C4
C1 3  0  1  1
C2 0  3  2  0
C3 1  2  7  0
C4 1  0  0  1

There may be more variables up to C20 and of course more rows. Thanks for helping me out!

4
  • In the example you provided, would you count G 0 1 0 0? By trying with your condition df2 <- df1[rowSums(df1[-1])==2,], I am not getting the expected result you showed. May be you need to look at your conditions. Commented Jun 12, 2015 at 16:15
  • Hi, Thanks. I made mistake, of course you provided right results as we count only rows where rowSums == 2 Commented Jun 13, 2015 at 9:57
  • Hi akrun, how could I upvote your comment? Commented Jun 15, 2015 at 7:36
  • I do not have the privilege still. Commented Jun 15, 2015 at 8:56

2 Answers 2

3

Try

 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  3  0  1  1
 #C2  0  3  2  0
 #C3  1  2  7  0
 #C4  1  0  0  1

Regarding the subset part, I am not getting the expected result,

 df1 <- df1[rowSums(df1[-1])==2,]
 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  2  0  1  1
 #C2  0  2  2  0
 #C3  1  2  3  0
 #C4  1  0  0  1

data

df1 <- structure(list(KEY = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J"), C1 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L), C2 = c(0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), C3 = c(1L, 1L, 1L, 1L, 1L, 
0L, 0L, 1L, 1L, 0L), C4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L)), .Names = c("KEY", "C1", "C2", "C3", "C4"), class = "data.frame", 
row.names = c(NA, -10L))
Sign up to request clarification or add additional context in comments.

Comments

2

looks like you want to subset first. Try this:

df  <-  read.csv("file1.csv")

df2 <-  subset(df, rowSums(df[,-1]) == 2)

m1 <- t(df2[-1])

colnames(m1) <- df1[,1]
tcrossprod(m1)

This gives

#     C1 C2 C3 C4
# C1  2  0  1  1
# C2  0  2  2  0
# C3  1  2  3  0
# C4  1  0  0  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.