Let's say I have multiple timeseries, representing different features, all of length n, and I want to predict a new timeseries which represents another feature, without any past history for that feature. So for example, one sample might look like this, where the red and blue are "input series" and the yellow was the "output series", which clearly had some relation to both inputs.
Now, I have hundreds of these examples, each of which is thousands of points long (each is a different length however), and which has about 8-10 input features. So one input might be 9 series (one for each feature), each of which is maybe 6000 long. For this input, I want to produce an output of the final feature which is also 6000 long. We'll call my input features a, b, c, d and my output feature y. What would be the best model to represent this type of problem?
I've thought of perhaps applying a simple feed-forward neural network, whose inputs are:
a(t-10), a(t-9), ... a(t)
b(t-10), b(t-9), ... b(t)
..... etc for all other input features
and whose output would be a single value: the predicted y(t). Issues with this model is that it fails to consider past predicted values of y(t) (that is, y(t-1), y(t-2), ...), which a good model should consider. I could set up the model to take in these inputs, and simply fed it past values of y in the training samples. However, when I actually want to produce new outputs on new inputs, what would I feed into these variables during the first 10 steps? I don't have any past info on the output series.
Another idea I had was to apply an LSTM, so that the model can consider further back in each series, with more recent values being more important - but again, LSTM uses past values of y to predict the current y, but my model must start without any past values of y.
Does anyone have any suggestions for what type of model would best suit this problem?
It's worth noting my actual features are much less directly related (correlation coefficients closer to .2 or .3 for each feature with the y). Additionally, past history of one input can influence the output quite a bit, meaning that including as much history as possible would probably be good. (For example, it might be the case that when input A has a sharp decrease early on, the output tends to increase in the later half of the series, this isn't actually true in my data but just an example). For reference, I'm creating this in keras, and I do have a pretty good understanding of basic models, I just don't know which one applies to this situation.
