R: Converting multiple binary columns into one factor variable whose factors are binary column names

Question

I am a new R user. Currently I am working on a dataset wherein I have to transform the multiple binary columns into single factor column

Here is the example:

current dataset like :

$ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
$ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
$ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
$ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...

Property.RealEstate  Property.Insurance  Property.CarOther  Property.Unknown
                  1                   0                  0                 0
                  0                   1                  0                 0
                  1                   0                  0                 0
                  0                   1                  0                 0
                  0                   0                  1                 0
                  0                   0                  0                 1

Recoded column should be:

   Property
1  Real estate
2  Insurance
3  Real estate
4  Insurance
5  CarOther
6  Unknown

It is basically a reverse of melt.matrix function.

Thank You all for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as:

Property.RealEstate Property.Insurance Property.CarOther Property.Unknown

         0                      0                      0               0

I want these to be marked as NA or Null

Would be a help if you suggest on this as well.

Thank You

Prasanna Nandakumar · Accepted Answer · 2014-01-29 04:37:24Z

2

> mat <- matrix(c(0,1,0,0,0,
+                 1,0,0,0,0,
+                 0,0,0,1,0,
+                 0,0,1,0,0,
+                 0,0,0,0,1), ncol = 5, byrow = TRUE)
> colnames(mat) <- c("Level1","Level2","Level3","Level4","Level5")
> mat
     Level1 Level2 Level3 Level4 Level5
[1,]      0      1      0      0      0
[2,]      1      0      0      0      0
[3,]      0      0      0      1      0
[4,]      0      0      1      0      0
[5,]      0      0      0      0      1

Create a new factor based upon the index of each 1 in each row Use the matrix column names as the labels for each level

NewFactor <- factor(apply(mat, 1, function(x) which(x == 1)), 
                    labels = colnames(mat)) 

> NewFactor 
[1] Level2 Level1 Level4 Level3 Level5 
Levels: Level1 Level2 Level3 Level4 Level5

also you can try:

factor(mat%*%(1:ncol(mat)), labels = colnames(mat))

also use Tomas solution - ifounf somewhere in SO

as.factor(colnames(mat)[mat %*% 1:ncol(mat)])

edited Jan 29, 2014 at 4:37

answered Jan 29, 2014 at 4:23

Prasanna Nandakumar

4,33539 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Swapnil Konduskar Over a year ago

Thank you Prasanna for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as: Property.RealEstate Property.Insurance Property.CarOther Property.Unknown 0 0 0 0 I want these to be marked as NA or Null Would be a help if you suggest on this as well. Thank You

Jeremy Coyle · Accepted Answer · 2014-01-30 14:50:22Z

2

Melt is certainly a solution. I'd suggest using the reshape2 melt as follows:

library(reshape2)

df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
              Property.Insurance=c(0,1,0,1,0,0),
              Property.CarOther=c(0,0,0,0,1,0),
              Property.Unknown=c(0,0,0,0,0,1))

#add id column (presumably you have ids more meaningful than row numbers)
df$row=1:nrow(df)

#melt to "long" format
long=melt(df,id="row")

#only keep 1's
long=long[which(long$value==1),]

#merge in ids for NA entries
long=merge(df[,"row",drop=F],long,all.x=T)

#clean up to match example output
long=long[order(long$row),"variable",drop=F]
names(long)="Property"
long$Property=gsub("Property.","",long$Property,fixed=T)

#results
long

edited Jan 30, 2014 at 14:50

answered Jan 29, 2014 at 4:33

Jeremy Coyle

4863 silver badges5 bronze badges

2 Comments

Swapnil Konduskar Over a year ago

Thank You Jeremy for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as: Property.RealEstate Property.Insurance Property.CarOther Property.Unknown 0 0 0 0 I want these to be marked as NA or Null Would be a help if you suggest on this as well. Thank You

Jeremy Coyle Over a year ago

I also added a more straightforward approach (in a separate answer)

Jeremy Coyle · Accepted Answer · 2014-01-30 15:06:13Z

1

Alternately, you can just do it in the naïve way. I think it's more transparent than any of the other suggestions (including my other suggestion).

df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
              Property.Insurance=c(0,1,0,1,0,0),
              Property.CarOther=c(0,0,0,0,1,0),
              Property.Unknown=c(0,0,0,0,0,1))

propcols=c("Property.RealEstate", "Property.Insurance", "Property.CarOther", "Property.Unknown")

df$Property=NA

for(colname in propcols)({
  coldata=df[,colname]
  df$Property[which(coldata==1)]=colname
})

df$Property=gsub("Property.","",df$Property,fixed=T)

edited Jan 30, 2014 at 15:06

answered Jan 30, 2014 at 14:59

Jeremy Coyle

4863 silver badges5 bronze badges

Comments

thelatemail · Accepted Answer · 2014-01-29 04:54:12Z

0

Something different:

Get the data:

dat <- data.frame(Property.RealEstate=c(1,0,1,0,0,0),Property.Insurance=c(0,1,0,1,0,0),Property.CarOther=c(0,0,0,0,1,0),Property.Unknown=c(0,0,0,0,0,1))

Reshape it:

names(dat)[row(t(dat))[t(dat)==1]]
#[1] "Property.RealEstate" "Property.Insurance"  "Property.RealEstate"
#[4] "Property.Insurance"  "Property.CarOther"   "Property.Unknown"

If you want it cleaned up, do:

gsub("Property\\.","",names(dat)[row(t(dat))[t(dat)==1]])
#[1] "RealEstate" "Insurance"  "RealEstate" "Insurance"  "CarOther"   "Unknown"

If you prefer a factor output:

factor(row(t(dat))[t(dat)==1],labels=names(dat))

...and cleaned up:

factor(row(t(dat))[t(dat)==1],labels=gsub("Property\\.","",names(dat)) )

edited Jan 29, 2014 at 4:54

answered Jan 29, 2014 at 4:35

thelatemail

94.3k12 gold badges140 silver badges197 bronze badges

1 Comment

Swapnil Konduskar Over a year ago

Thank You for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as: Property.RealEstate Property.Insurance Property.CarOther Property.Unknown 0 0 0 0 I want these to be marked as NA or Null Would be a help if you suggest on this as well. Thank You

Collectives™ on Stack Overflow

R: Converting multiple binary columns into one factor variable whose factors are binary column names

4 Answers 4

1 Comment

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related