4

I am trying to create a list of factors that have a binary response and have been using cast.

DF2 <- cast(data.frame(DM), id ~ region)
names(DF2)[-1] <- paste("region", names(DF2)[-1], sep = "")

The problem I am getting is that the responses are the frequency of which the answer comes up, while I'm looking for just whether or not it matches.

For example I have:

id region
 1   2
 1   3
 2   2
 3   1
 3   1

What I'd like is:

id region1 region2 region3
1   0          1     1
2   0          1     0
3   1          0     0

4 Answers 4

8

I kind of prefer dcast from reshape2:

library(reshape2)
dat <- read.table(text = "id region
 1   2
 1   3
 2   2
 3   1
 3   1",header = TRUE,sep = "")

dcast(dat,id~region,fun.aggregate = function(x){as.integer(length(x) > 0)})

  id 1 2 3
1  1 0 1 1
2  2 0 1 0
3  3 1 0 0

There may be a smoother way to do that, but I'll be honest I don't cast stuff all that often.

Sign up to request clarification or add additional context in comments.

1 Comment

I love the reshape2 package, but sometimes things seem so slow. In comparison to your code, (table(df) > 0)+0 is almost 24 times faster on my system! Specifying value.var usually speeds things up a little bit, but I wonder if there are any other tips on how to speed up dcast.
5

Original data:

x <- data.frame(id=c(1,1,2,3,3), region=factor(c(2,3,2,1,1)))

> x
  id region
1  1      2
2  1      3
3  2      2
4  3      1
5  3      1

Group up the data:

aggregate(model.matrix(~ region - 1, data=x), x["id"], max)

Result:

  id region1 region2 region3
1  1       0       1       1
2  2       0       1       0
3  3       1       0       0

2 Comments

I knew there had to be an aggregate solution, but it wasn't coming to me. +1.
@mrdwab Don't worry, it took me 12 hours for it to click!
4

Here's sort of a "tricky" way to do it in one line using table (the brackets are important). Assuming your data.frame is named df:

(table(df) > 0)+0
#    region
# id  1 2 3
#   1 0 1 1
#   2 0 1 0
#   3 1 0 0

table(df) > 0 gives us TRUE and FALSE; adding +0 converts the TRUE and FALSE to numbers.

Comments

1

No specialized functions are needed:

x <- data.frame(id=1:4, region=factor(c(3,2,1,2)))
x
   id region
1  1      3
2  2      2
3  3      1
4  4      2

x.bin <- data.frame(x$id, sapply(levels(x$region), `==`, x$region))
names(x.bin) <- c("id", paste("region", levels(x$region),sep=''))
x.bin

  id region1 region2 region3
1  1   FALSE   FALSE    TRUE
2  2   FALSE    TRUE   FALSE
3  3    TRUE   FALSE   FALSE
4  4   FALSE    TRUE   FALSE

Or for integer results:

x.bin2 <- data.frame(x$id,  
    apply(sapply(levels(x$region), `==`, x$region),2,as.integer)
) 
names(x.bin2) <- c("id", paste("region", levels(x$region),sep=''))
x.bin2


  id region1 region2 region3
1  1       0       0       1
2  2       0       1       0
3  3       1       0       0
4  4       0       1       0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.