Computing values for R dataFrame cells without using for loops

Question

I have a R dataFrame with the followings:

Serial N         year         current    Average 
   B              10            14          15
   B              10            16          15
   C              12            13          12
   D              40            20          20
   B              11            15          15
   C              12            11          12

I would like to have a new column based on the average for a unique serial number. I would like to have something like :

Serial N         year         current    Average      temp 
   B              10            14          15        (15+12+20)/15
   B              10            16          15        (15+12+20)/15
   C              12            13          12        (15+12+20)/12
   D              40            20          20        (15+12+20)/20
   B              11            15          15        (15+12+20)/15
   C              12            11          12        (15+12+20)/12

temp column is the addition of the average value for each Serial N ( for B,C and D) over the value of the average for that row. How can I computing it without using for loops as rows 1,2 and 5 (Serial N: B) is the same in terms of Average column and temp? I started with this:

for (i in unique(df$Serial_N))
   {
       .........
    }

but I got stuck as I also need the average for other Serial N. How can I do this?

You can use library dplyr and grouping to achieve what you want. But, it is not clear to me how you are getting those numbers (15 + 12 + 20) / 15. Can you update the question to reflect the right values from input data? — Gopala
– Gopala, Commented Mar 22, 2016 at 19:10
15 is the Average for Serial N (B), 12 is the average for Serial N (C) and 20 is the Average for Setial N (C) and the /15 is the Average for that Serial N (B) — user3841581
– user3841581, Commented Mar 22, 2016 at 19:14

Gopala · Accepted Answer · 2016-03-22 19:23:06Z

3

For example, you can try something like the following (assuming your computation matches):

df$temp <- sum(tapply(df$Average, df$SerialN, mean)) / df$Average

Resulting output:

  SerialN year current Average     temp
1       B   10      14      15 3.133333
2       B   10      16      15 3.133333
3       C   12      13      12 3.916667
4       D   40      20      20 2.350000
5       B   11      15      15 3.133333
6       C   12      11      12 3.916667

edited Mar 22, 2016 at 19:23

answered Mar 22, 2016 at 19:12

Gopala

10.5k7 gold badges48 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

alistaire Over a year ago

I think he means sum(unique(df$Average)) / Average (assuming averages aren't repeated between different groups...there are more thorough ways)

user3841581 Over a year ago

What happen if I would like to get access to Average (for each unique Serial N)?

alistaire Over a year ago

sum(tapply(df$Average, df$Serial_N, unique)) / Average, maybe, though there's got to be a simpler way.

Gopala Over a year ago

Based on the update and clarification, I agree. It is the correct way to go.

Reese · Accepted Answer · 2016-03-22 19:32:32Z

3

Using unique.data.frame() can avoid repeat in Average between different groups

df$temp <- sum((unique.data.frame(df[c("Serial_N","Average")]))$Average) / df$Average

answered Mar 22, 2016 at 19:32

Reese

886 bronze badges

1 Comment

alistaire Over a year ago

Nice use of unique! You don't really need to specify the .data.frame method, though, as that's what will get called anyway if you pass it a data.frame. Also, you should probably have a , before c(... to show you want all the rows; it works as-is, but it's good to be thorough.

alistaire · Accepted Answer · 2016-03-22 19:27:14Z

1

In base R, you can use either

df <- transform(df, temp = sum(tapply(df$Average, df$Serial_N, unique))/df$Average)

or

df$temp <- sum(tapply(df$Average, df$Serial_N, unique))/df$Average

both of which will give you

df
#   Serial_N year current Average     temp
# 1        B   10      14      15 3.133333
# 2        B   10      16      15 3.133333
# 3        C   12      13      12 3.916667
# 4        D   40      20      20 2.350000
# 5        B   11      15      15 3.133333
# 6        C   12      11      12 3.916667

tapply splits df$Average by the levels of df$Serial_N, and then calls unique on them, which gives you a single average for each group, which you can then sum and divide. transform adds a column (equivalent to dplyr::mutate).

answered Mar 22, 2016 at 19:27

alistaire

43.5k4 gold badges80 silver badges119 bronze badges

7 Comments

user3841581 Over a year ago

What happen if I would like to get access to Average (for each unique Serial N)? Since there are identical, how can I access each (Serial N B,C, and D) ?

alistaire Over a year ago

That's what the tapply gives you; you can index the result to get a single average. Or use Reese's unique(df[,c("Serial_N","Average")]) if you like.

user3841581 Over a year ago

I get your point, thank you. But let me may be rephrase what I mean, I would like to use each of the Average (unique) as input for another function (I do not want to use loop through each of them as I have about 10000 distinct Serial N each having an Average. I can get unique Serial N and Average using what you mentioned, but how can I use each Average without loops (use each of them as an input to another function)?

alistaire Over a year ago

Save the results of tapply or whatnot to a variable, and pass that variable as the input. Or pass the tapply directly. You can almost always pass a vector of values in R; there's no need for loops for such purposes.

user3841581 Over a year ago

I did this: temp=unique(df[,c("Serial_N","Average")]) , then I did this temp$new_set=tapply(temp$Average, function(x) { 2 * pnorm(x * sqrt(2)) - 1} ). So for each value (unique of Average), I would like to apply that function. I got the erro: r error in unique.default(x) unique() applies only to vectors

|

Collectives™ on Stack Overflow

Computing values for R dataFrame cells without using for loops

3 Answers 3

4 Comments

1 Comment

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related