2

I have a dataframe with three factors of which two are binary and the third one is integer:

       DATA   YEAR1   YEAR2   REGION1   REGION2
OBS1   X      1        0      1         0  
OBS2   Y      1        0      0         1
OBS3   Z      0        1      1         0

etc.

Now I want to transform it to something like this

       YEAR1_REGION1   YEAR1_REGION2   YEAR2_REGION1   YEAR2_REGION2
OBS1   X               0               0               0
OBS2   0               Y               0               0
OBS3   0               0               Z               0

Basic matrix multiplication is not what I'm after. I would like to find a neat way to do this that would automatically have the columns renamed as well. My actual data has three factor dimensions with 20*8*6 observations so finally there will be 960 columns altogether.

2 Answers 2

4

Here's another approach based on outer and similar to @Roland answer.

year <- grep("YEAR", names(DF), value = TRUE)
region <- grep("REGION", names(DF), value = TRUE)
data <- as.character(DF$DATA)

df <- outer(year, region, function(x, y) DF[,x] * DF[,y])
colnames(df) <- outer(year, region, paste, sep = "_")
df <- as.data.frame(df)

for (i in seq_len(ncol(df)))
    df[as.logical(df[,i]), i] <- data[as.logical(df[,i])]

df
##      YEAR1_REGION1 YEAR2_REGION1 YEAR1_REGION2 YEAR2_REGION2
## OBS1             X             0             0             0
## OBS2             0             0             Y             0
## OBS3             0             Z             0             0
Sign up to request clarification or add additional context in comments.

Comments

4

Maybe others will come up with a more succinct possibility, but this creates the expected result:

DF <- read.table(text="       DATA   YEAR1   YEAR2   REGION1   REGION2
OBS1   X      1        0      1         0  
OBS2   Y      1        0      0         1
OBS3   Z      0        1      1         0", header=TRUE)

DF[,-1] <- lapply(DF[,-1], as.logical)
DF[,1] <- as.character(DF[,1])

res <- apply(expand.grid(2:3, 4:5), 1, function(i) {
  tmp <- rep("0", length(DF[,1]))
  ind <- do.call(`&`,DF[,i])
  tmp[ind] <- DF[ind,1]
  tmp <- list(tmp)
  names(tmp) <- paste0(names(DF)[i], collapse="_")
  tmp
})

res <- as.data.frame(res)
rownames(res) <- rownames(DF)


#      YEAR1_REGION1 YEAR2_REGION1 YEAR1_REGION2 YEAR2_REGION2
# OBS1             X             0             0             0
# OBS2             0             0             Y             0
# OBS3             0             Z             0             0

However, I suspect there is a much better possibility to achieve what you actually want to do, without creating a huge wide-format data.frame.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.