8

I am attempting to format a column of data into many binary columns to eventually use for association rule mining. I have had some success using a for loop and a simple triplet matrix, but I am unsure how to aggregate by the levels in the first column thereafter--similar to a group by statement in SQL. I have provided an example below, albeit with a much smaller data set--if successful my actual data set will be 4,200 rows by 3,902 columns so any solution needs to be scaleable. Any suggestions or alternative approaches would be greatly appreciated!

> data <- data.frame(a=c('sally','george','andy','sue','sue','sally','george'), b=c('green','yellow','green','yellow','purple','brown','purple'))
> data
       a      b
1  sally  green
2 george yellow
3   andy  green
4    sue yellow
5    sue purple
6  sally  brown
7 george purple

x <- data[,1]
for(i in as.numeric(2:ncol(data))) 
 x <- cbind(x, simple_triplet_matrix(i=1:nrow(data), j=as.numeric(data[,i]),
              v = rep(1,nrow(data)), dimnames = list(NULL, levels(data[,i]))) )

##Looks like this:

> as.matrix(x)

     name    brown green purple yellow
[1,] "sally"  "0"    "1"   "0"     "0"    
[2,] "george" "0"    "0"   "0"     "1"   
[3,] "andy"   "0"    "1"   "0"     "0"    
[4,] "sue"    "0"    "0"   "0"     "1"   
[5,] "sue"    "0"    "0"   "1"     "0"    
[6,] "sally"  "1"    "0"   "0"     "0" ##Need to aggregate by Name

##Would like it to look like this:
     name    brown green purple yellow
[1,] "sally"  "1"   "1"   "0"    "0"    
[2,] "george" "0"   "0"   "0"    "1"   
[3,] "andy"   "0"   "1"   "0"    "0"    
[4,] "sue"    "0"   "0"   "1"    "1"   
1
  • Why do you want everything as character? Does as.data.frame.matrix(data) achieve what you want? Commented Dec 5, 2012 at 18:02

1 Answer 1

6

This should do the trick:

## Get a contingency table of counts
X <- with(data, table(a,b))

## Massage it into the format you're wanting 
cbind(name = rownames(X), apply(X, 2, as.character))
#      name     brown green purple yellow
# [1,] "andy"   "0"   "1"   "0"    "0"   
# [2,] "george" "0"   "0"   "1"    "1"   
# [3,] "sally"  "1"   "1"   "0"    "0"   
# [4,] "sue"    "0"   "0"   "1"    "1"   
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.