0

I have a "big" data frame where I need to do a calcul like below :

data <- data.frame( "name"=c( "Tom", "Peter", "Peter", "Peter", "Tom", "Peter" ), "goal"=c(1,-2,2,3,-1,0), "total"=0 )
for( i in 1:nrow(data) ) {
  count <- 0
  for ( j in 1:i) {
    if (data$name[j] == data$name[i]) {
      count <- count + data$goal[j]
    }
  }
  data$total[i] <- count
}

> data
   name goal total
1   Tom    1     1
2 Peter   -2    -2
3  John    2     2
4 Peter    3     1
5   Tom   -1     0
6 Peter    0     1

I need to perform the calculation of the "total" column by adding the goals scored before.

My database is currently 83000 rows long and the calculation is very long. I would like to do this calculation without a "for" loop. Do you have an idea ?

I saw the following post but I don't know how to adapt it.

Thanks in advance

1 Answer 1

1

If you want to avoid for loops, try to find vectorized functions that do what you want. (Or functions working on dataframes or other multidimensional objects). For your example you can separate the dataframe according to name using group_by from dplyr and then use the vectorized function cumsum (cumulative sum):

library(dplyr)
data <- data %>% group_by(name) %>% mutate(total = cumsum(goal))

Output

> data
# A tibble: 6 x 3
# Groups:   name [2]
  name   goal total
  <chr> <dbl> <dbl>
1 Tom       1     1
2 Peter    -2    -2
3 Peter     2     0
4 Peter     3     3
5 Tom      -1     0
6 Peter     0     3

I used your dataframe initialization in your post, which is why I get a different output than yours.

If you want to drop the grouping after your manipulation, use ungroup.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.