Hello all! I will like to seek the community guidance in the following aspect of feature engineering on time series forecasting.
Suppose I want to create 2 exogenous variable which reflects at some points in my historical data, if that time point is a missing value (value 0) and an outlier or not. missing value (1: yes 0: no) outlier value (1: yes 0: no).
The purpose of doing this is to allow tree based model to learn, throughout the time in history, what are the periods it experience missing and outlier data point. I do not want to remove them because I do not have enough data points for training. And treating the outlier may result the model in learning false events.
I will like to know, when generating future forecast (say 12 months into the future), you also need to provide those 2 exogenous variable for the model to do prediction. Is it safe to say, the 12 month future data points, they are all not missing value and not outlier at the same time?
Another problem that I have is that I'm doing forecasting at scale with more than a thousand time series. So visualizing them one by one is not a feasible solution. Does anybody face the similar situation and able to enlighten me? Thanks!!