2

Problem - Data Wrangling:

I want to fine adjust the note of a Multiple-Choice-Questions exam with 5 items on each question - A, B, C, D, E. I want to use coefficients on each possible item. For this I need to do some data wrangling:

Input:

library(tibble)

(
  df <- tribble(
  ~id,   ~Q1,   ~Q2,   ~Q3, 
#|----|------|------|------|
    1,  "CDE",   "A",  "AD",
    2,  "CDE",  "AB",  "AD",
    3,   "DE",  "BC",  "AD")
)

Expected output :

id Q1_A Q1_B Q1_C Q1_D Q1_E Q2_A Q2_B Q2_C Q2_D Q2_E Q3_A Q3_B Q3_C Q3_D Q3_E
1 0 0 1 1 1 1 0 0 0 0 1 0 0 1 0
2 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0
3 0 0 0 1 1 0 1 1 0 0 1 0 0 1 0

3 Answers 3

0

We could use mtabulate by splitting

library(qdapTools)
cbind(df[1], do.call(cbind, lapply(df[-1],
       function(x) mtabulate(strsplit(x, "")))))

Or using base R with table after splitting each of the column values with strsplit, get the frequency count and then cbind the list elements

cbind(df[1], do.call(cbind, lapply(df[-1], function(x) {
       x1 <- strsplit(x, "")
 as.data.frame.matrix(table(data.frame(ind = rep(seq_along(x1), 
    lengths(x1)), val = factor(unlist(x1), levels = LETTERS[1:5]))))})))

-output

#  id Q1.A Q1.B Q1.C Q1.D Q1.E Q2.A Q2.B Q2.C Q2.D Q2.E Q3.A Q3.B Q3.C Q3.D Q3.E
#1  1    0    0    1    1    1    1    0    0    0    0    1    0    0    1    0
#2  2    0    0    1    1    1    1    1    0    0    0    1    0    0    1    0
#3  3    0    0    0    1    1    0    1    1    0    0    1    0    0    1    0
Sign up to request clarification or add additional context in comments.

Comments

0

Another base R option

cbind(
  df[1],
  `colnames<-`(
    do.call(
      cbind,
      lapply(
        df[-1],
        function(x) {
          t(sapply(
            strsplit(x, ""),
            function(v) table(factor(v, levels = LETTERS[1:5]))
          ))
        }
      )
    ),
    paste0(rep(names(df)[-1], each = 5), "_", LETTERS[1:5])
  )
)

which gives

  id Q1_A Q1_B Q1_C Q1_D Q1_E Q2_A Q2_B Q2_C Q2_D Q2_E Q3_A Q3_B Q3_C Q3_D Q3_E
1  1    0    0    1    1    1    1    0    0    0    0    1    0    0    1    0
2  2    0    0    1    1    1    1    1    0    0    0    1    0    0    1    0
3  3    0    0    0    1    1    0    1    1    0    0    1    0    0    1    0

Comments

0

Very clever oneliners from other posters but hard to decypher.

This is a more readable solution imho:

ABCDE <- LETTERS[1:5]
one_col_to_five <- function(col) sapply(ABCDE,  grepl, col)
(proper_df <- do.call(cbind, lapply(df[, -1], one_col_to_five)))
(proper_df <- as.data.frame(cbind(df$id, proper_df)))

names(proper_df) <- c("id", paste(rep(names(df[-1]), 5), ABCDE, sep = "_"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.