5

I am a new R user. Currently I am working on a dataset wherein I have to transform the multiple binary columns into single factor column

Here is the example:

current dataset like :

$ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
$ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
$ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
$ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...

Property.RealEstate  Property.Insurance  Property.CarOther  Property.Unknown
                  1                   0                  0                 0
                  0                   1                  0                 0
                  1                   0                  0                 0
                  0                   1                  0                 0
                  0                   0                  1                 0
                  0                   0                  0                 1

Recoded column should be:

   Property
1  Real estate
2  Insurance
3  Real estate
4  Insurance
5  CarOther
6  Unknown

It is basically a reverse of melt.matrix function.

Thank You all for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as:

Property.RealEstate Property.Insurance Property.CarOther Property.Unknown

         0                      0                      0               0

I want these to be marked as NA or Null

Would be a help if you suggest on this as well.

Thank You

4 Answers 4

2
> mat <- matrix(c(0,1,0,0,0,
+                 1,0,0,0,0,
+                 0,0,0,1,0,
+                 0,0,1,0,0,
+                 0,0,0,0,1), ncol = 5, byrow = TRUE)
> colnames(mat) <- c("Level1","Level2","Level3","Level4","Level5")
> mat
     Level1 Level2 Level3 Level4 Level5
[1,]      0      1      0      0      0
[2,]      1      0      0      0      0
[3,]      0      0      0      1      0
[4,]      0      0      1      0      0
[5,]      0      0      0      0      1

Create a new factor based upon the index of each 1 in each row Use the matrix column names as the labels for each level

NewFactor <- factor(apply(mat, 1, function(x) which(x == 1)), 
                    labels = colnames(mat)) 

> NewFactor 
[1] Level2 Level1 Level4 Level3 Level5 
Levels: Level1 Level2 Level3 Level4 Level5 

also you can try:

factor(mat%*%(1:ncol(mat)), labels = colnames(mat)) 

also use Tomas solution - ifounf somewhere in SO

as.factor(colnames(mat)[mat %*% 1:ncol(mat)])
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you Prasanna for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as: Property.RealEstate Property.Insurance Property.CarOther Property.Unknown 0 0 0 0 I want these to be marked as NA or Null Would be a help if you suggest on this as well. Thank You
2

Melt is certainly a solution. I'd suggest using the reshape2 melt as follows:

library(reshape2)

df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
              Property.Insurance=c(0,1,0,1,0,0),
              Property.CarOther=c(0,0,0,0,1,0),
              Property.Unknown=c(0,0,0,0,0,1))

#add id column (presumably you have ids more meaningful than row numbers)
df$row=1:nrow(df)

#melt to "long" format
long=melt(df,id="row")

#only keep 1's
long=long[which(long$value==1),]

#merge in ids for NA entries
long=merge(df[,"row",drop=F],long,all.x=T)

#clean up to match example output
long=long[order(long$row),"variable",drop=F]
names(long)="Property"
long$Property=gsub("Property.","",long$Property,fixed=T)

#results
long

2 Comments

Thank You Jeremy for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as: Property.RealEstate Property.Insurance Property.CarOther Property.Unknown 0 0 0 0 I want these to be marked as NA or Null Would be a help if you suggest on this as well. Thank You
I also added a more straightforward approach (in a separate answer)
1

Alternately, you can just do it in the naïve way. I think it's more transparent than any of the other suggestions (including my other suggestion).

df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
              Property.Insurance=c(0,1,0,1,0,0),
              Property.CarOther=c(0,0,0,0,1,0),
              Property.Unknown=c(0,0,0,0,0,1))

propcols=c("Property.RealEstate", "Property.Insurance", "Property.CarOther", "Property.Unknown")

df$Property=NA

for(colname in propcols)({
  coldata=df[,colname]
  df$Property[which(coldata==1)]=colname
})

df$Property=gsub("Property.","",df$Property,fixed=T)

Comments

0

Something different:

Get the data:

dat <- data.frame(Property.RealEstate=c(1,0,1,0,0,0),Property.Insurance=c(0,1,0,1,0,0),Property.CarOther=c(0,0,0,0,1,0),Property.Unknown=c(0,0,0,0,0,1))

Reshape it:

names(dat)[row(t(dat))[t(dat)==1]]
#[1] "Property.RealEstate" "Property.Insurance"  "Property.RealEstate"
#[4] "Property.Insurance"  "Property.CarOther"   "Property.Unknown" 

If you want it cleaned up, do:

gsub("Property\\.","",names(dat)[row(t(dat))[t(dat)==1]])
#[1] "RealEstate" "Insurance"  "RealEstate" "Insurance"  "CarOther"   "Unknown" 

If you prefer a factor output:

factor(row(t(dat))[t(dat)==1],labels=names(dat))

...and cleaned up:

factor(row(t(dat))[t(dat)==1],labels=gsub("Property\\.","",names(dat)) )

1 Comment

Thank You for your Precious Inputs. It does work. But one issue though, I have some rows which takes value as: Property.RealEstate Property.Insurance Property.CarOther Property.Unknown 0 0 0 0 I want these to be marked as NA or Null Would be a help if you suggest on this as well. Thank You

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.