3

I'm not sure if this is possible in R, but I have a dataframe original_data with one row and columns as follows:

A  Ar   A1   A1r   B    Br   B1   B1r   C   Cr   C1   C1r......
0  0.1  0.5  0.1  0.1  0.6  0.7  1.2   1.4  1.2  1.5  1.8.....
structure(list(A = 0L, Ar = 0.1, A1 = 0.5, A1r = 0.1, B = 0.1, 
    Br = 0.6, B1 = 0.7, B1r = 1.2, C = 1.4, Cr = 1.2, C1 = 1.5, 
    C1r = 1.8), row.names = c(NA, -1L), class = "data.frame")

To explain what A, Ar, A1, and A1r mean:

A : ID with measurement taken at Visit 1.

Ar: Same ID as A but a replicate from Visit1

A1: Same ID as A, but measurement taken at Visit 2.

A1r: Same ID as A, but a replicate of the measurement A1.

I want to transform it to a dataframe that looks as follows:

ID   Visit   Replicate   Value
A     1         1         0
A     1         2         0.1
A     2         1         0.5
A     2         2         0.1
B     1         1         0.1
B     1         2         0.6
B     2         1         0.7
B     2         2         1.2

I tried to do it in R:

new_data_frame = data.frame(ID=character(0),Visit=integer(0),Replicate=integer(0),Value=integer(0))

for(i in 1:ncol(original_data))

{   #this is for the column "ID"

    new_data_frame$ID[i]=colnames(original_data)[i]

    #this is for the column "Replicate"
    if(grepl("r",colnames(original_data)[i])==True)
     {
         new_data_frame$Replicate[i]=2
     }
    else
    {
         new_data_frame$Replicate[i]=1
    }

    #this is for the column "Visit"
   if(grepl("1",colnames(original_data)[i])==True)
    {
      new_data_frame$Visit[i]=2
    }
   else
   {
    new_data_frame$Visit[i]=1
   }

#this is for the column "Value"
new_data_frame$Value[i]=original_data[,i]

}

I get an error:

Error in `$<-.data.frame`(`*tmp*`, "ID", value = NA_integer_) : 
  replacement has 1 row, data has 0

How can I fix my code to make this work?

1
  • Is it safe to asume there can only be two visits and two replicates? (based off your code), or will that not always be true? Commented Jan 10, 2020 at 14:20

4 Answers 4

5

The ID is the first character, Visit is 1 + (the number in the name or 0 if no number), Replicate is 1 + (1 if the name ends in 'r' else 0), and Value is the value of the unlisted data.frame.

df_vec <- unlist(df)

data.frame(
  ID = substr(names(df_vec), 1, 1),
  Visit = 1 + dplyr::coalesce(readr::parse_number(names(df_vec)), 0),
  Replicate = 1 + grepl('r$', names(df_vec)),
  Value = df_vec)

#     ID Visit Replicate Value
# A    A     1         1   0.0
# Ar   A     1         2   0.1
# A1   A     2         1   0.5
# A1r  A     2         2   0.1
# B    B     1         1   0.1
# Br   B     1         2   0.6
# B1   B     2         1   0.7
# B1r  B     2         2   1.2
# C    C     1         1   1.4
# Cr   C     1         2   1.2
# C1   C     2         1   1.5
# C1r  C     2         2   1.8
Sign up to request clarification or add additional context in comments.

Comments

2

Here is one solution using tidyverse packages. This basically transforms your dataframe into long format and uses the (old) column names to extract the info that you need. Right now this assumes there can only be one replicate but there can be more than two visits. If there can only be two visits it would be easy to simplify the creation of the Visit variable:

library(tidyr)
library(dplyr)

    df1 %>%
      pivot_longer(everything()) %>%
      transmute(ID = gsub("(\\d+|r)", "", name),
                Visit = ifelse(grepl("\\d", name), 1 + as.integer(gsub("\\D", "", name)), 1),
                Replicate = ifelse(grepl("r", name, fixed = T), 2, 1))

# A tibble: 12 x 3
   ID    Visit Replicate
   <chr> <dbl>     <dbl>
 1 A         1         1
 2 A         1         2
 3 A         2         1
 4 A         2         2
 5 B         1         1
 6 B         1         2
 7 B         2         1
 8 B         2         2
 9 C         1         1
10 C         1         2
11 C         2         1
12 C         2         2

Comments

1

Here's a solution using stack to convert data into long format and then using data.table:

library(data.table)
df <- stack(df)
setDT(df)[, ID := substr(ind, 1, 1)][, Visit := ifelse(grepl("\\d", ind) == T, as.numeric(gsub("[^0-9.]", "",  ind)) + 1, 1)][, Replicate := ifelse(grepl("r", ind) == T, 2, 1)][, c("ID", "Visit", "Replicate", "values")]

#   ID Visit Replicate values
#1:  A     1         1    0.0
#2:  A     1         2    0.1
#3:  A     2         1    0.5
#4:  A     2         2    0.1
#5:  B     1         1    0.1
#6:  B     1         2    0.6
#7:  B     2         1    0.7
#8:  B     2         2    1.2
#9:  C     1         1    1.4
#10: C     1         2    1.2
#11: C     2         1    1.5
#12: C     2         2    1.8

Comments

0

I am new to it. But I tried like this and it worked for me. Yes you can do like this:

New_data <- data.frame("variable1" = old$variable1, "variable2" = old$variable2, "variable3" = old$variable3)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.