Create function that calculates two columns with second calculating from first column?

Question

I am trying to do the following in data.table or create a function in replace of a for loop. However, I am not sure how to return two columns with one depending on the calculation of another. The dataset contains sales and delivery units for each 'place' by month however, only a starting inventory for the first month. I need to calculate the beginning inventory of each period by first calculating the ending inventory of the last month at that place. Ending inventory for each place is equal to the starting inventory minus sales units plus delivery units.

Here is how i am currently calculating:

data <- data.table(place = c('a','b'),
                 month = c(1,1,2,2,3,3,4,4,5,5,6,6),
                 sales = c(20,2,3,5,6,7,8,1,5,1,5,3),
                 delivery = c(1,1,1,1,1,1,1,1,1,1,1,1),
                 starting_inv = c(100,100,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
                 ending_inv = c(81,99,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA) ) 

print(data)

   place month sales delivery starting_inv ending_inv
 1:     a     1    20        1          100         81
 2:     b     1     2        1          100         99
 3:     a     2     3        1           NA         NA
 4:     b     2     5        1           NA         NA
 5:     a     3     6        1           NA         NA
 6:     b     3     7        1           NA         NA
 7:     a     4     8        1           NA         NA
 8:     b     4     1        1           NA         NA
 9:     a     5     5        1           NA         NA
10:     b     5     1        1           NA         NA
11:     a     6     5        1           NA         NA
12:     b     6     3        1           NA         NA

dt <- data[order(place,month)]

print(dt)

    place month sales delivery starting_inv ending_inv
 1:     a     1    20        1          100         81
 2:     a     2     3        1           NA         NA
 3:     a     3     6        1           NA         NA
 4:     a     4     8        1           NA         NA
 5:     a     5     5        1           NA         NA
 6:     a     6     5        1           NA         NA
 7:     b     1     2        1          100         99
 8:     b     2     5        1           NA         NA
 9:     b     3     7        1           NA         NA
10:     b     4     1        1           NA         NA
11:     b     5     1        1           NA         NA
12:     b     6     3        1           NA         NA

for (i in 1:nrow(dt)) {


  if (dt[i]$month != 1) {
  dt$starting_inv[i] <- dt[i-1]$ending_inv
  dt$ending_inv[i] <- dt[i]$starting_inv - dt[i]$sales  + dt[i]$delivery 
  }
  

}

print(dt)

   place month sales delivery starting_inv ending_inv
 1:     a     1    20        1          100         81
 2:     a     2     3        1           81         79
 3:     a     3     6        1           79         74
 4:     a     4     8        1           74         67
 5:     a     5     5        1           67         63
 6:     a     6     5        1           63         59
 7:     b     1     2        1          100         99
 8:     b     2     5        1           99         95
 9:     b     3     7        1           95         89
10:     b     4     1        1           89         89
11:     b     5     1        1           89         89
12:     b     6     3        1           89         87

I would like to avoid the step that requires I sort the table by Place and Month. Then Calculating this on a table with much more data takes too long to run and I am having trouble making this in to a vectorized function.

pseudospin · Accepted Answer · 2020-08-26 23:12:45Z

The iteration is captured by the cumulative sum, the rest can then be vectorised so should be fast.

data[, starting_inv := cumsum(shift(delivery-sales, fill = starting_inv[1])), place]
data[, ending_inv := starting_inv+delivery-sales]

data
#>     place month sales delivery starting_inv ending_inv
#>  1:     a     1    20        1          100         81
#>  2:     b     1     2        1          100         99
#>  3:     a     2     3        1           81         79
#>  4:     b     2     5        1           99         95
#>  5:     a     3     6        1           79         74
#>  6:     b     3     7        1           95         89
#>  7:     a     4     8        1           74         67
#>  8:     b     4     1        1           89         89
#>  9:     a     5     5        1           67         63
#> 10:     b     5     1        1           89         89
#> 11:     a     6     5        1           63         59
#> 12:     b     6     3        1           89         87

This assumes the actual data you are dealing with is ordered by month. If it is not then insert an order(month) after the first square bracket in the first line.

akrun · Accepted Answer · 2020-08-26 23:22:10Z

Here is one option with accumulate2 from purrr

library(purrr)
library(dplyr)
library(tidyr)
dt %>%
     group_by(place) %>%
     dplyr::mutate(starting_inv = accumulate2(delivery, sales, 
        ~ ..1 - ..3 + ..2 , .init = first(starting_inv))[-n()]) %>% 
     unnest(c(starting_inv)) %>%
     mutate(ending_inv = lead(starting_inv))
# A tibble: 12 x 6
# Groups:   place [2]
#   place month sales delivery starting_inv ending_inv
#   <chr> <dbl> <dbl>    <dbl>        <dbl>      <dbl>
# 1 a         1    20        1          100         81
# 2 a         2     3        1           81         79
# 3 a         3     6        1           79         74
# 4 a         4     8        1           74         67
# 5 a         5     5        1           67         59
# 6 a         6     5        1           59         NA
# 7 b         1     2        1          100         99
# 8 b         2     5        1           99         95
# 9 b         3     7        1           95         89
#10 b         4     1        1           89         89
#11 b         5     1        1           89         87
#12 b         6     3        1           87         NA

This can be also used along with data.table

dt[, starting_inv := unlist(accumulate2(delivery, sales, 
     function(x, y, z) x - z + y ,
   .init = first(starting_inv))[-.N]), place][, ending_inv := 
         shift(starting_inv, type = 'lead'), place]

Collectives™ on Stack Overflow

Create function that calculates two columns with second calculating from first column?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related