I have a time series of 3 different products, which have been sold at 4 different stores over some time period. I want to fill in the missing data so that I have a complete data set. All the missing data should be substituted by 0.
Here is the code to generate the dataset. The randomtimes function was copied from @abarnert [https://stackoverflow.com/questions/50165501/generate-random-list-of-timestamps-# in-python][1]
import datetime
import random
import pandas as pd
import numpy as np
random.seed(42)
np.random.seed(42)
def randomtimes(start, end, n):
stime = datetime.datetime.strptime(start, '%d-%m-%Y')
etime = datetime.datetime.strptime(end, '%d-%m-%Y')
td = etime - stime
print(td)
dates = [round(random.random(),1) * td + stime for _ in range(n)]
return dates
# set vars
nsp = 5 # nr of days
nd = 3 # nr of days
ns = 3 # nr of stores
npr = 2 # nr of products
# generate data
total = nd*ns*npr
s = random.sample('1'*nd*ns +'2'*nd*ns+'3'*nd*ns, total)# number of stores
p = random.sample("a"*nd*ns+ "b"*nd*ns, total)
so = list(np.random.choice(range(20,100),total))
stime = '01-02-2000'
etime = '03-02-2000'
date = np.array(randomtimes(stime, etime, nsp)).astype('datetime64[D]')
product = []
store = []
sold = []
for x in range(1,len(date)+1):
product.append(s.pop())
store.append(p.pop())
sold.append(so.pop())
data = {'date':date,
'product':product,
'sold':sold,
'store':store
}
df = pd.DataFrame(data )
df
date product sold store
0 2000-02-02 3 95 b
1 2000-02-01 1 88 a
2 2000-02-02 1 81 a
3 2000-02-03 1 66 a
4 2000-02-02 3 88 a
This result should look like this.
0 2000-02-01 1 88 a
1 2000-02-01 2 0 a
2 2000-02-01 3 0 a
3 2000-02-01 1 0 b
4 2000-02-01 2 0 b
5 2000-02-01 3 95 b
6 2000-02-02 1 81 a
7 2000-02-02 2 0 a
8 2000-02-02 3 88 a
9 2000-02-02 1 0 b
10 2000-02-02 2 0 b
11 2000-02-02 3 0 b
12 2000-02-03 1 66 a
13 2000-02-03 2 0 a
14 2000-02-03 3 0 a
15 2000-02-03 1 0 b
16 2000-02-03 2 0 b
17 2000-02-03 3 0 b
Also, is there a better way to generate this toy data?
I appreciate you help.