Combining variables in dataframe

Question

I have such a dataframe:

KEY C1  C2  C3  C4
A   0   0   1   0
B   0   0   1   0
C   0   1   1   0
D   0   0   1   0
E   1   0   1   0
F   1   0   0   0
G   0   1   0   0
H   0   0   1   0
I   0   1   1   0
J   1   0   0   1

and would like to build this kind of matrix with only two values "1" being in the two variables.

I would not like to count rows where there are more then two values like:

KEY C1  C2  C3  C4
L   1   0   1   1

or less then two:

M   1   0   0  0

Output should be frequency table.

   C1 C2 C3 C4
C1 3  0  1  1
C2 0  3  2  0
C3 1  2  7  0
C4 1  0  0  1

There may be more variables up to C20 and of course more rows. Thanks for helping me out!

In the example you provided, would you count G 0 1 0 0? By trying with your condition df2 <- df1[rowSums(df1[-1])==2,], I am not getting the expected result you showed. May be you need to look at your conditions. — akrun
– akrun, Commented Jun 12, 2015 at 16:15
Hi, Thanks. I made mistake, of course you provided right results as we count only rows where rowSums == 2 — Przemo
– Przemo, Commented Jun 13, 2015 at 9:57

akrun · Accepted Answer · 2015-06-12 16:30:24Z

3

Try

 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  3  0  1  1
 #C2  0  3  2  0
 #C3  1  2  7  0
 #C4  1  0  0  1

Regarding the subset part, I am not getting the expected result,

 df1 <- df1[rowSums(df1[-1])==2,]
 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  2  0  1  1
 #C2  0  2  2  0
 #C3  1  2  3  0
 #C4  1  0  0  1

data

df1 <- structure(list(KEY = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J"), C1 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L), C2 = c(0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), C3 = c(1L, 1L, 1L, 1L, 1L, 
0L, 0L, 1L, 1L, 0L), C4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L)), .Names = c("KEY", "C1", "C2", "C3", "C4"), class = "data.frame", 
row.names = c(NA, -10L))

edited Jun 12, 2015 at 16:30

answered Jun 12, 2015 at 16:09

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Aaron Katch · Accepted Answer · 2015-06-12 16:22:14Z

2

looks like you want to subset first. Try this:

df  <-  read.csv("file1.csv")

df2 <-  subset(df, rowSums(df[,-1]) == 2)

m1 <- t(df2[-1])

colnames(m1) <- df1[,1]
tcrossprod(m1)

This gives

#     C1 C2 C3 C4
# C1  2  0  1  1
# C2  0  2  2  0
# C3  1  2  3  0
# C4  1  0  0  1

answered Jun 12, 2015 at 16:22

Aaron Katch

4513 silver badges13 bronze badges

Collectives™ on Stack Overflow

Combining variables in dataframe

2 Answers 2

data

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

data

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related