R dataframe: New variables with for-loops and regular expressions

Question

In R, I would like to create new variables in a data frame by making some computations between specific existing variables. The variable name of the new variables, and the particular existing variables to be used in the computations is (or should be ) defined by a regular expression.

I know the description is kind of confusing, so here an example with an imaginary data set where some variables (V1, V2, V3) were measured at 2 different time-points (T1, T2):

dataframe <- data.frame(matrix(rnorm(70), nrow=10))
names(dataframe) <- c("Subject", "V1_T1", "V1_T2", "V2_T1", "V2_T2", "V3_T1", "V3_T2")
dataframe$subject <-  factor(dataframe$Subject)

Now, for each subject, and each "Tn" (T1, T2, T3) I would like to generate a new variable (in the same data frame), which should be the result of an operation between different variables with the same "Tn". Here some pseudo-code to try to explain my needs a bit more clearly (I hope)

for i in c(T1, T2, T3){                            #For each timepoint (& Subject)...
    dataframe$V4_*i* <- V1_*i* + V2_*i* / V3_*i*   #Compute V4 = V1 + V2 / V3
}

This should result in several new V4_n variables (V4_T1, V4_T2, V4_T3) corresponding to the result of the V1 + V2 / V3 operation for each time-point Tn and each Subject.

In short, I would like to use regular expressions and for-loops to name and compute new variables, looping a predefined operation over variables specified by something like a regular expression. (It is not mandatory that I use for loops or regular expressions, If there are alternative methods to achieve what I want I would like to hear about them)

I have been toying a bit with the for-loop and regular expression documentation in R, but so far I have not been successful in producing the desired result. I can of course manually write down all required computations in regular R script, one by one, but that is not efficient at all (considering that the actual data-set where I need to apply this is far more complex than this one), and it is pretty annoying to have to copy-paste and edit the same piece of code several times over (also, more susceptible to typos and errors).

Any help/suggestions would be appreciated, thanks!

Steven Beaupré · Accepted Answer · 2016-05-26 16:16:00Z

Since your example didn't entirely reflect your question, I took the liberty to create a new dataset which I think respect the spirit of your issue:

Let's assume df

   Subject       V1_T1       V1_T2      V2_T1       V2_T2       V3_T1       V3_T2
1        A  0.16694311  0.47190422  0.6571530  1.68428290  0.60685147  1.25383252
2        B  0.45561405  1.01849804  1.6041593 -1.40256942  1.50029772  1.34857932
3        C  0.31762739 -0.78986513 -0.8054005 -0.14714956 -0.63612792 -0.13565903
4        D  0.66536682 -0.57231682  0.1362731  0.03632215 -0.82147539  0.42349920
5        E  0.09113996  0.73319950  0.1046914 -0.75730274 -0.72833574  0.08412158
6        F  0.01751232 -0.78021331 -0.9158299 -0.68345547 -0.08848462 -0.18618554
7        G -0.96602939  1.08286247  0.6116285  0.08982368  0.12721634  0.71738577
8        H -1.06444232 -0.03971332 -0.5394623 -1.34349634 -0.76919950 -3.01150549
9        I -0.83680136 -0.54609901 -0.1261597 -1.13312110  0.23785615  0.85203224
10       J  1.98656695 -0.01522142  0.7850551  0.93551804 -0.26279470 -0.80375911

For each Subject, create two new columns V4_T1 and V4_T2 being the result (V1 + V2) / V3 for their respective Tn value.

You could restructure your data in a long format using gather(), then separate() the initial column names in two distinct columns, spread() back the result in a wide format to perform operations on each Subject & Tn combinaison and store in V4 using mutate(). Then we gather() one last time to unite() the columns and spread back the result to achieve your desired output:

library(tidyr)
library(dplyr)

df %>%
  gather(key, value, -Subject) %>%
  separate(key, c("V", "T")) %>%
  spread(V, value) %>%
  mutate(V4 = (V1 + V2) / V3) %>%
  gather(key, value, -(Subject:T)) %>%
  unite(R, key, T) %>%
  spread(R, value)

Which gives:

   Subject       V1_T1       V1_T2      V2_T1       V2_T2       V3_T1       V3_T2
1        A  0.16694311  0.47190422  0.6571530  1.68428290  0.60685147  1.25383252
2        B  0.45561405  1.01849804  1.6041593 -1.40256942  1.50029772  1.34857932
3        C  0.31762739 -0.78986513 -0.8054005 -0.14714956 -0.63612792 -0.13565903
4        D  0.66536682 -0.57231682  0.1362731  0.03632215 -0.82147539  0.42349920
5        E  0.09113996  0.73319950  0.1046914 -0.75730274 -0.72833574  0.08412158
6        F  0.01751232 -0.78021331 -0.9158299 -0.68345547 -0.08848462 -0.18618554
7        G -0.96602939  1.08286247  0.6116285  0.08982368  0.12721634  0.71738577
8        H -1.06444232 -0.03971332 -0.5394623 -1.34349634 -0.76919950 -3.01150549
9        I -0.83680136 -0.54609901 -0.1261597 -1.13312110  0.23785615  0.85203224
10       J  1.98656695 -0.01522142  0.7850551  0.93551804 -0.26279470 -0.80375911
         V4_T1      V4_T2
1    1.3579865  1.7196771
2    1.3729097 -0.2847970
3    0.7667846  6.9071309
4   -0.9758538 -1.2656332
5   -0.2688751 -0.2865285
6   10.1522452  7.8613452
7   -2.7858123  1.6346660
8    2.0851608  0.4593084
9   -4.0485020 -1.9708410
10 -10.5467198 -1.1449906

rafa.pereira · Accepted Answer · 2016-05-26 16:41:16Z

0

Try a data.table solution:

library(data.table)
setDT(dataframe)


# define name of new columns to create
  cols <- noquote(paste0("V4_T",1:4))


dataframe[ , (cols) := lapply(list(1:4), function(x)  get(paste0("V1_T", x)) + get(paste0("V2_T", x)) / get(paste0("V3_T", x)))  ]

answered May 26, 2016 at 16:41

rafa.pereira

13.9k6 gold badges77 silver badges119 bronze badges

1 Comment

user3620237 Over a year ago

Thanks as well for your response! It is neat and does the trick. I am only accepting the previous response as an answer because it was first, but this solution works just as well.

Collectives™ on Stack Overflow

R dataframe: New variables with for-loops and regular expressions

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related