0

I have a R dataFrame with the followings:

Serial N         year         current    Average 
   B              10            14          15
   B              10            16          15
   C              12            13          12
   D              40            20          20
   B              11            15          15
   C              12            11          12

I would like to have a new column based on the average for a unique serial number. I would like to have something like :

Serial N         year         current    Average      temp 
   B              10            14          15        (15+12+20)/15
   B              10            16          15        (15+12+20)/15
   C              12            13          12        (15+12+20)/12
   D              40            20          20        (15+12+20)/20
   B              11            15          15        (15+12+20)/15
   C              12            11          12        (15+12+20)/12

temp column is the addition of the average value for each Serial N ( for B,C and D) over the value of the average for that row. How can I computing it without using for loops as rows 1,2 and 5 (Serial N: B) is the same in terms of Average column and temp? I started with this:

for (i in unique(df$Serial_N))
   {
       .........
    }     

but I got stuck as I also need the average for other Serial N. How can I do this?

2
  • 1
    You can use library dplyr and grouping to achieve what you want. But, it is not clear to me how you are getting those numbers (15 + 12 + 20) / 15. Can you update the question to reflect the right values from input data? Commented Mar 22, 2016 at 19:10
  • 15 is the Average for Serial N (B), 12 is the average for Serial N (C) and 20 is the Average for Setial N (C) and the /15 is the Average for that Serial N (B) Commented Mar 22, 2016 at 19:14

3 Answers 3

3

For example, you can try something like the following (assuming your computation matches):

df$temp <- sum(tapply(df$Average, df$SerialN, mean)) / df$Average

Resulting output:

  SerialN year current Average     temp
1       B   10      14      15 3.133333
2       B   10      16      15 3.133333
3       C   12      13      12 3.916667
4       D   40      20      20 2.350000
5       B   11      15      15 3.133333
6       C   12      11      12 3.916667
Sign up to request clarification or add additional context in comments.

4 Comments

I think he means sum(unique(df$Average)) / Average (assuming averages aren't repeated between different groups...there are more thorough ways)
What happen if I would like to get access to Average (for each unique Serial N)?
sum(tapply(df$Average, df$Serial_N, unique)) / Average, maybe, though there's got to be a simpler way.
Based on the update and clarification, I agree. It is the correct way to go.
3

Using unique.data.frame() can avoid repeat in Average between different groups

df$temp <- sum((unique.data.frame(df[c("Serial_N","Average")]))$Average) / df$Average

1 Comment

Nice use of unique! You don't really need to specify the .data.frame method, though, as that's what will get called anyway if you pass it a data.frame. Also, you should probably have a , before c(... to show you want all the rows; it works as-is, but it's good to be thorough.
1

In base R, you can use either

df <- transform(df, temp = sum(tapply(df$Average, df$Serial_N, unique))/df$Average)

or

df$temp <- sum(tapply(df$Average, df$Serial_N, unique))/df$Average

both of which will give you

df
#   Serial_N year current Average     temp
# 1        B   10      14      15 3.133333
# 2        B   10      16      15 3.133333
# 3        C   12      13      12 3.916667
# 4        D   40      20      20 2.350000
# 5        B   11      15      15 3.133333
# 6        C   12      11      12 3.916667

tapply splits df$Average by the levels of df$Serial_N, and then calls unique on them, which gives you a single average for each group, which you can then sum and divide. transform adds a column (equivalent to dplyr::mutate).

7 Comments

What happen if I would like to get access to Average (for each unique Serial N)? Since there are identical, how can I access each (Serial N B,C, and D) ?
That's what the tapply gives you; you can index the result to get a single average. Or use Reese's unique(df[,c("Serial_N","Average")]) if you like.
I get your point, thank you. But let me may be rephrase what I mean, I would like to use each of the Average (unique) as input for another function (I do not want to use loop through each of them as I have about 10000 distinct Serial N each having an Average. I can get unique Serial N and Average using what you mentioned, but how can I use each Average without loops (use each of them as an input to another function)?
Save the results of tapply or whatnot to a variable, and pass that variable as the input. Or pass the tapply directly. You can almost always pass a vector of values in R; there's no need for loops for such purposes.
I did this: temp=unique(df[,c("Serial_N","Average")]) , then I did this temp$new_set=tapply(temp$Average, function(x) { 2 * pnorm(x * sqrt(2)) - 1} ). So for each value (unique of Average), I would like to apply that function. I got the erro: r error in unique.default(x) unique() applies only to vectors
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.