Split dataframe columns according to string pattern matching

Question

Problem - Data Wrangling:

I want to fine adjust the note of a Multiple-Choice-Questions exam with 5 items on each question - A, B, C, D, E. I want to use coefficients on each possible item. For this I need to do some data wrangling:

Input:

library(tibble)

(
  df <- tribble(
  ~id,   ~Q1,   ~Q2,   ~Q3, 
#|----|------|------|------|
    1,  "CDE",   "A",  "AD",
    2,  "CDE",  "AB",  "AD",
    3,   "DE",  "BC",  "AD")
)

Expected output :

id	Q1_C	Q1_D	Q1_E	Q2_A	Q2_B	Q2_C	Q3_A	Q3_D
1	1	1	1	1	0	0	1	1
2	1	1	1	1	1	0	1	1
3	0	1	1	0	1	1	1	1

akrun · Accepted Answer · 2021-01-12 21:43:09Z

0

We could use mtabulate by splitting

library(qdapTools)
cbind(df[1], do.call(cbind, lapply(df[-1],
       function(x) mtabulate(strsplit(x, "")))))

Or using base R with table after splitting each of the column values with strsplit, get the frequency count and then cbind the list elements

cbind(df[1], do.call(cbind, lapply(df[-1], function(x) {
       x1 <- strsplit(x, "")
 as.data.frame.matrix(table(data.frame(ind = rep(seq_along(x1), 
    lengths(x1)), val = factor(unlist(x1), levels = LETTERS[1:5]))))})))

-output

#  id Q1.A Q1.B Q1.C Q1.D Q1.E Q2.A Q2.B Q2.C Q2.D Q2.E Q3.A Q3.B Q3.C Q3.D Q3.E
#1  1    0    0    1    1    1    1    0    0    0    0    1    0    0    1    0
#2  2    0    0    1    1    1    1    1    0    0    0    1    0    0    1    0
#3  3    0    0    0    1    1    0    1    1    0    0    1    0    0    1    0

edited Jan 12, 2021 at 21:43

answered Jan 12, 2021 at 21:36

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ThomasIsCoding · Accepted Answer · 2021-01-13 00:05:26Z

0

Another base R option

cbind(
  df[1],
  `colnames<-`(
    do.call(
      cbind,
      lapply(
        df[-1],
        function(x) {
          t(sapply(
            strsplit(x, ""),
            function(v) table(factor(v, levels = LETTERS[1:5]))
          ))
        }
      )
    ),
    paste0(rep(names(df)[-1], each = 5), "_", LETTERS[1:5])
  )
)

which gives

  id Q1_A Q1_B Q1_C Q1_D Q1_E Q2_A Q2_B Q2_C Q2_D Q2_E Q3_A Q3_B Q3_C Q3_D Q3_E
1  1    0    0    1    1    1    1    0    0    0    0    1    0    0    1    0
2  2    0    0    1    1    1    1    1    0    0    0    1    0    0    1    0
3  3    0    0    0    1    1    0    1    1    0    0    1    0    0    1    0

answered Jan 13, 2021 at 0:05

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Comments

pietrodito · Accepted Answer · 2021-01-14 10:14:37Z

0

Very clever oneliners from other posters but hard to decypher.

This is a more readable solution imho:

ABCDE <- LETTERS[1:5]
one_col_to_five <- function(col) sapply(ABCDE,  grepl, col)
(proper_df <- do.call(cbind, lapply(df[, -1], one_col_to_five)))
(proper_df <- as.data.frame(cbind(df$id, proper_df)))

names(proper_df) <- c("id", paste(rep(names(df[-1]), 5), ABCDE, sep = "_"))

edited Jan 14, 2021 at 10:14

answered Jan 13, 2021 at 13:28

pietrodito

2,1281 gold badge17 silver badges28 bronze badges

Collectives™ on Stack Overflow

Split dataframe columns according to string pattern matching

Problem - Data Wrangling:

Input:

Expected output :

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Problem - Data Wrangling:

Input:

Expected output :

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related