How to create a dataframe from using data from another dataframe?

Question

I'm not sure if this is possible in R, but I have a dataframe original_data with one row and columns as follows:

A  Ar   A1   A1r   B    Br   B1   B1r   C   Cr   C1   C1r......
0  0.1  0.5  0.1  0.1  0.6  0.7  1.2   1.4  1.2  1.5  1.8.....

structure(list(A = 0L, Ar = 0.1, A1 = 0.5, A1r = 0.1, B = 0.1, 
    Br = 0.6, B1 = 0.7, B1r = 1.2, C = 1.4, Cr = 1.2, C1 = 1.5, 
    C1r = 1.8), row.names = c(NA, -1L), class = "data.frame")

To explain what A, Ar, A1, and A1r mean:

A : ID with measurement taken at Visit 1.

Ar: Same ID as A but a replicate from Visit1

A1: Same ID as A, but measurement taken at Visit 2.

A1r: Same ID as A, but a replicate of the measurement A1.

I want to transform it to a dataframe that looks as follows:

ID   Visit   Replicate   Value
A     1         1         0
A     1         2         0.1
A     2         1         0.5
A     2         2         0.1
B     1         1         0.1
B     1         2         0.6
B     2         1         0.7
B     2         2         1.2

I tried to do it in R:

new_data_frame = data.frame(ID=character(0),Visit=integer(0),Replicate=integer(0),Value=integer(0))

for(i in 1:ncol(original_data))

{   #this is for the column "ID"

    new_data_frame$ID[i]=colnames(original_data)[i]

    #this is for the column "Replicate"
    if(grepl("r",colnames(original_data)[i])==True)
     {
         new_data_frame$Replicate[i]=2
     }
    else
    {
         new_data_frame$Replicate[i]=1
    }

    #this is for the column "Visit"
   if(grepl("1",colnames(original_data)[i])==True)
    {
      new_data_frame$Visit[i]=2
    }
   else
   {
    new_data_frame$Visit[i]=1
   }

#this is for the column "Value"
new_data_frame$Value[i]=original_data[,i]

}

I get an error:

Error in `$<-.data.frame`(`*tmp*`, "ID", value = NA_integer_) : 
  replacement has 1 row, data has 0

How can I fix my code to make this work?

Is it safe to asume there can only be two visits and two replicates? (based off your code), or will that not always be true? — Andrew
– Andrew, Commented Jan 10, 2020 at 14:20

IceCreamToucan · Accepted Answer · 2020-01-10 14:32:12Z

5

The ID is the first character, Visit is 1 + (the number in the name or 0 if no number), Replicate is 1 + (1 if the name ends in 'r' else 0), and Value is the value of the unlisted data.frame.

df_vec <- unlist(df)

data.frame(
  ID = substr(names(df_vec), 1, 1),
  Visit = 1 + dplyr::coalesce(readr::parse_number(names(df_vec)), 0),
  Replicate = 1 + grepl('r$', names(df_vec)),
  Value = df_vec)

#     ID Visit Replicate Value
# A    A     1         1   0.0
# Ar   A     1         2   0.1
# A1   A     2         1   0.5
# A1r  A     2         2   0.1
# B    B     1         1   0.1
# Br   B     1         2   0.6
# B1   B     2         1   0.7
# B1r  B     2         2   1.2
# C    C     1         1   1.4
# Cr   C     1         2   1.2
# C1   C     2         1   1.5
# C1r  C     2         2   1.8

edited Jan 10, 2020 at 14:32

answered Jan 10, 2020 at 14:22

IceCreamToucan

28.8k2 gold badges27 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrew · Accepted Answer · 2020-01-10 14:23:22Z

Here is one solution using tidyverse packages. This basically transforms your dataframe into long format and uses the (old) column names to extract the info that you need. Right now this assumes there can only be one replicate but there can be more than two visits. If there can only be two visits it would be easy to simplify the creation of the Visit variable:

library(tidyr)
library(dplyr)

    df1 %>%
      pivot_longer(everything()) %>%
      transmute(ID = gsub("(\\d+|r)", "", name),
                Visit = ifelse(grepl("\\d", name), 1 + as.integer(gsub("\\D", "", name)), 1),
                Replicate = ifelse(grepl("r", name, fixed = T), 2, 1))

# A tibble: 12 x 3
   ID    Visit Replicate
   <chr> <dbl>     <dbl>
 1 A         1         1
 2 A         1         2
 3 A         2         1
 4 A         2         2
 5 B         1         1
 6 B         1         2
 7 B         2         1
 8 B         2         2
 9 C         1         1
10 C         1         2
11 C         2         1
12 C         2         2

sm925 · Accepted Answer · 2020-01-10 16:08:37Z

1

Here's a solution using stack to convert data into long format and then using data.table:

library(data.table)
df <- stack(df)
setDT(df)[, ID := substr(ind, 1, 1)][, Visit := ifelse(grepl("\\d", ind) == T, as.numeric(gsub("[^0-9.]", "",  ind)) + 1, 1)][, Replicate := ifelse(grepl("r", ind) == T, 2, 1)][, c("ID", "Visit", "Replicate", "values")]

#   ID Visit Replicate values
#1:  A     1         1    0.0
#2:  A     1         2    0.1
#3:  A     2         1    0.5
#4:  A     2         2    0.1
#5:  B     1         1    0.1
#6:  B     1         2    0.6
#7:  B     2         1    0.7
#8:  B     2         2    1.2
#9:  C     1         1    1.4
#10: C     1         2    1.2
#11: C     2         1    1.5
#12: C     2         2    1.8

edited Jan 10, 2020 at 16:08

answered Jan 10, 2020 at 15:21

sm925

2,6881 gold badge19 silver badges33 bronze badges

Comments

Rajnish kumar · Accepted Answer · 2020-01-10 14:21:46Z

0

I am new to it. But I tried like this and it worked for me. Yes you can do like this:

New_data <- data.frame("variable1" = old$variable1, "variable2" = old$variable2, "variable3" = old$variable3)

answered Jan 10, 2020 at 14:21

Rajnish kumar

1961 silver badge16 bronze badges

Collectives™ on Stack Overflow

How to create a dataframe from using data from another dataframe?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related