This message was deleted.
# neural-forecast
s
This message was deleted.
j
Hey. I think this could be due to missing dates in your training df. Can you verify this with the fill_gaps function? e.g.
Copy code
from utilsforecast.preprocessing import fill_gaps

filled = fill_gaps(HCPCS_Grouped_ts_mlf, start='per_serie', end='per_serie', freq='MS')
assert filled.shape[0] == HCPCS_Grouped_ts_mlf.shape[0]
If this fails it means some dates are missing and you could provide the
filled
df instead (after filling the target missing values)
đź‘€ 1
b
I have
ts2 = fill_gaps(ts2, freq='MS')
in a previous step. Went ahead and ran the code above and it ran successfully. I didn't have the start and stop in mine, but tested and I get the same shape either way.
j
Are you using 1.6.4?
b
Copy code
Name: neuralforecast
Version: 1.6.4
Summary: Time series forecasting suite using deep learning models
Home-page: <https://github.com/Nixtla/neuralforecast/>
Author: Nixtla
Author-email: <mailto:business@nixtla.io|business@nixtla.io>
License: Apache Software License 2.0
Location: c:\programdata\miniconda3\envs\py310env\lib\site-packages
Requires: numba, numpy, optuna, pandas, pytorch-lightning, ray, torch, utilsforecast
Required-by: 
Note: you may need to restart the kernel to use updated packages.
this is what pip show provides
j
Do any of your series have less than 4 samples? I think maybe the val_size=3 could be removing some series
b
Smallest number of observations in any of the series is 48
Is that what you meant/
I also tried dropping val_size to 1 and still get same issue
j
The error happens here, right?
b
I may not be following what you mean. That link takes me to around line 614 fo the core code. I'm not sure that's it or not. The problem arises with this
forecasts_nf_df_fits = nf.predict_insample(step_size=1)
of my code.
j
but you should see that line in your stacktrace. Can you paste it here?
b
Copy code
ValueError                                Traceback (most recent call last)
Cell In[182], line 1
----> 1 forecasts_nf_df_fits = nf.predict_insample(step_size=1)

File C:\ProgramData\miniconda3\envs\py310env\lib\site-packages\neuralforecast\core.py:622, in NeuralForecast.predict_insample(self, step_size)
    620 # Append predictions in memory placeholder
    621 output_length = len(model.loss.output_names)
--> 622 fcsts[:, col_idx : (col_idx + output_length)] = model_fcsts
    623 col_idx += output_length
    624 model.set_test_size(test_size=test_size)  # Set original test_size

ValueError: could not broadcast input array from shape (159510,9) into shape (158616,9)
Looks like you were correct
j
They're different numbers now. Is it because of the val_size?
Pinging @Cristian (Nixtla) in case you may have an idea of what's going on because apart from the dates I don't know where this shape mismatch could be coming from
b
The initial size from my initial post was the full dataframe. I subsampled 10% to speed up for testing. I've also played with the val_size to see if it mattered. I now have it back to val_size=3.
j
But the numbers changed a bit (not 90% less) • original msg: (159120,9) into shape (158232,9) • latest err : (159510,9) into shape (158616,9) I'm just trying to figure out where the difference comes from, that might help us track where the mismatch happens
Do you have 130 ids?
b
Thanks, @José Morales. Appreciate all of the help. I'll get back with you later this afternoon with that. I got pulled into a meeting.
c
Hi @Brian Head. For some reason the model is returning more forecasts than needed, but I don't know why is the case yet. To pinpoint the issue I suggest simplifying your pipeline to the absolute minimum, and then start adding components, and see where it fails. For example: 1. Start only with the NHITS, no historic variables nor future variables. Maybe even only keeping 1 time series. 2. Add all series. 3. Add exogenous covariates. 4. Include more models. And let us know where it fails.
b
Thank you for that suggestion, @Cristian (Nixtla). I will work through that. @José Morales it looks like the difference in numbers is because I updated my code to use the
filled = fill_gaps(HCPCS_Grouped_ts_mlf, start='per_serie', end='per_serie', freq='MS')
which returns the different DF shape. I was following this. Now I'm wondering if the first part of what you sent (including the start and end for the fill_gaps function) was only for testing that or if I should use (e.g., which should I use)? I still get the error either way, but want to make sure I'm using getting the right gaps filled. BTW, I also checked and it is filling gaps with both approaches, but more with the version you provided.
j
It depends, sometimes you want the series to start at the same time, or end at the same time. In this case we want to keep the boundaries but make sure there aren't any missing dates between the start and end. Also make sure you fill the target with appropriate values, since that function just includes the missing rows with NaN in the target
b
Gotcha. Yeah, I have a mix of start dates. They should all have the same end date. So, in that case would I used only the end? I don't see this covered in the documentation, but maybe I missed it. And, when I had this working it was actually on a different, albeit related dataframe, so that might also be why I haven't experienced this problem with any of SF, MLF, or NF yet--maybe there's something going on in the data with these that wasn't with the other DF. I am filling the gaps created with
fill_gaps
by using a .fillna after.
j
I believe all cases are covered here. If you want the same end you can set
end='global'
👍 1
Can you try with the dev version of neuralforecast? We recently changed that function and that could fix the error.
pip install git+<https://github.com/nixtla/neuralforecast.git|https://github.com/nixtla/neuralforecast.git>
Otherwise it may be more efficient if you can just give us the sizes of your series so that we try to replicate it on our side
b
Following your (@Cristian (Nixtla) & @José Morales) advice, I think I've narrowed the issue down to what is happening with the
fill_gaps
function prior to model fit and predictions. In a previous run (using similar data on a slightly older version of Nixtla packages) I used the default setting with
fill_gaps
not realizing there were options for the start and end dates of filling. That worked correctly. However, it wasn't working now, so I played with the options for start and end. The only way any of the models I've tried (e.g., NBEATS, NBEATSx, RNN, DilatedRNN, NHITS, LSTM, MLP) will successfully run the
predict_insample
is when I set both
start
and
end
to
global
. However, this produces odd results for some of the series that have a later start date--at the beginning of the series they have a major spike in the insample predictions when nothing actually occured there. Is there any workaround for this? Note: Here are the current versions of packages I'm using • Neuralforecast 1.6.4 • Statsforecast 1.6.0 - for deriving season and trend • Utilsforecast 0.0.21
j
Were you able to install neuralforecast from github? We're not sure if it's a bug in the previous implementation or in the development one
Nevermind, the error is in both versions, we're looking into it. Thanks for reporting it!
b
Great. Thanks, @José Morales!