I have a dataset of historical precipitation records (1990-2010) for different locations (latitude and longitude), having a table with 5 attributes (lat,lon,year,month,prec). The dataset is organized by defining groups by latitude, longitude and time. For example:
INPUT
lan/lon/year/month/prec
-17/18/1990/1/0.4
-17/18/1990/2/0.02
-17/18/1990/3/0.12
-17/18/1990/4/0.06
.
.
.
-17/18/2020/12/0.35
-17/20/1990/1/0.2
-17/20/1990/2/0.2
-17/20/1990/3/0.2
-17/20/1990/4/0.2
.
.
.
-17/20/2020/12/0.08
-18/20/1990/1/0.11
-18/20/1990/2/0.11
-18/20/1990/3/0.11
.
.
.
.
EXPECTED OUTPUT (accumulation period=3)
lan/lon/year/month/prec/prec_3
-17/18/1990/1/0.4/-
-17/18/1990/2/0.02/-
-17/18/1990/3/0.12/0.54
-17/18/1990/4/0.06/0.2
.
.
.
-17/18/2020/12/0.35/12.58
-17/20/1990/1/0.2/-
-17/20/1990/2/0.2/-
-17/20/1990/3/0.2/0.6
-17/20/1990/4/0.2/0.8
.
.
.
-17/20/2020/12/0.08/35.0
-18/20/1990/1/0.11/-
-18/20/1990/2/0.11/-
-18/20/1990/3/0.11/0.33
.
.
.
.
I want to perform an analysis on that time series and that analysis consists of performing calculations on the precipitation variable, such as adding up different accumulation periods, for example, 3 and 6 months for the time period by coordinate pair, and then adjusting the data to a probability distribution. Does anyone know how to perform these ''sums'' taking into account that it should be in the given time period and should not use the information related to another given latitude and longitude? Additional information There are monthly records from 1990 to 2020, the calculation must be restarted when the longitude or latitude changes since that indicate that it is another point and the data (all the record) are in CSV format. the information is organized and doesn't have nan values