Hi Nixtla team, can MLforecast work with timeseri...
# mlforecast
Hi Nixtla team, can MLforecast work with timeseries with missing values, for example say I am predicting a solar irradiance process that only takes values during the day otherwise it is zero. • I would like to completely ommit the nighttime timesteps. What I hope for: • model should focus on the daytime hrs, night time I can safely zero out. • I do not want my error metrics seem amazing when there is a model biased towards zero. Thanks for the tips and have a great day : ) M.
Hey. I think you can build a dataset containing only the daytime hours and use an integer as the time column and set
. Something like:
Copy code
df = df.sort_values([id_col, time_col])
daytime_df = df[df[time_col].dt.hour.between(hour_start, hour_end)]].copy()
daytime_df[time_col] = daytime_df.groupby(id_col).cumcount()
fcst = MLForecast(freq=1, ...)
👍 1
thanks, will do 🫡
One more question, will the lag features work ?
Copy code
freq = '15T',
target_transforms = [LocalStandardScaler()],
lags = np.arange(1, 10).tolist()
meaning if there are no lags none will be added for that particular timestep ?
every day the morning hours will be missing lags
If you use only datetime hours in your dataset the lag1 for the first hour in the morning will be the value of the last datetime hour in the previous day
I see that is perhaps fine, I thought it will then consider the lags as integers as well that
Copy code
I have timesteps:
1 - 12
12 - 24
are missing

and hence
25 th timestep wont have lag_1.
but this behaviour you explained makes sense as well.
the lags are taken by position, the time column isn't checked. So you could have something like this:
Copy code
time  value  lag1
11    1      nan
12    2      1
24    3      2
25    4      3
@José Morales, thanks for your suggestion, I have actually tried this, the small issue is that after removing nighttime hours the algorithm will have trouble with gaps specifically in the test set, see screenshot. In the train set the gaps seem to be fine. Ex. screenshot if i predict for the consistent 2 steps forward it will work. However if a step is missing, e.g. I predict 3 steps forward, it throws an obvious error. Are there some tips and tricks for "irregulat timestep timeseries ?" Thanks and have a nice day : )
I think you can use integer timestamps as I suggested above. mlforecast uses the freq that you specify to create the timestamps in the predictions. If you specify integers and freq=1 you should be able to work around it (you'd need to translate those integers back to your timestamps for analysis)
Yes I have tried your suggestion, I do specify freq=1, but for this to work I need to again make consistent timeseries and hence throw out the nighttime hrs before, • the process previously had almost perfect daily seasonality • If I throw out the night, the seasonality becomes slightly more tricky. So yes freq=1 does work for me but the ds cannot be incosistent in the test predictions. Example: sun sets at 18 and rises at 4 of the next day, (10 hrs missing out of 24)
Copy code
ds = 1 ... 18,28,29,30, ...
• interestingly, this "sparse" timestep works in MLforecast.fit method, but not in predict or in cross_validate. • in predict, I obviously have to provide X_df for all the timesteps for the specified horizon • Example: I cant specify horizon h = 3 with the df_test from the screenshot, I can specify h = 2. timestep has to become
Copy code
ds = 1 ... 18,19,20,21, ...
• which means that I am currently trying to do seasonal transformations and lags before removing the night • then I recreate the freq=1 index using np.arange to be consistent and hence dataset is usable in the predict and cv methods. • But perhaps it is all way more simple and I am just approaching it wrong 😄
So you want to use the values from night but not forecast them?
I believe what you're doing is the only way at the moment if you want to achieve that. It's not a really common use case so we don't have any support for that right now