Nixtla Community #neural-forecast

Aravind Karunakaran

02/11/2025, 9:36 AM

h = 100

nf = NeuralForecast(models=[NHITS(h=h, input_size=2*h, max_steps=500, enable_progress_bar=False, logger=False)], freq=1);

cv_df = nf.cross_validation(Y_df, n_windows=4, step_size=h, verbose=0)

cv_df.head()

In the Nixtla documentation for cross-validation, its mentioned that for this CV, the model trains for the first 300 steps (i.e 0-299), but the input size is 200, so why does it train till 300?

Aravind Karunakaran

02/12/2025, 10:37 AM

Hey, I need some clarity on the usage of static exogenous variables - in the examples with the AirPassengers dataset, the

stat_exog_list

is initialized with 'airlines1'. In the AirPassengersStatic however, there are airlines 1 and 2. So why is only airlines1 used here? I'm asking because I'm trying to NeuralForecast to forecast the velocity of 5 projects, so I want to know if I have to include all the 5 project names in the stat_exog_list other than passing it as a static_df during fit. Also, another question I have is - if I am trying to forecast velocity based on a couple of other metrics in the data, does the model capture dependencies on those other metrics automatically or do I have to include them in

hist_exog_list

Antonio

02/13/2025, 11:46 AM

Hello everyone, I have a pre-trained TSMixerx model that I would like to load and train with the updated dataset, starting from the already trained parameter configuration and weights. The pre-trained model was defined as: {'input_size': 36, 'max_steps': 1500, 'val_check_steps': 75, 'early_stop_patience_steps': 5, 'learning_rate': 0.001, 'n_block': 4, 'dropout': 0.0, 'ff_dim': 90, 'scaler_type': 'robust', 'revin': False, 'logger': True, 'futr_exog_list': ('exog1', 'exog2', 'exog3', 'exog4', 'exog5'), 'hist_exog_list': ('lag_24',), 'n_series': 5, 'h': 168, 'loss': MAE(), 'valid_loss': MAE()}

Y_hat_df = nf.cross_validation(df=final_series[(final_series.ds >= train_start) &

(final_series.ds <= test_end)],

val_size=horizon*k_fold,

test_size=horizon,

n_windows=None,

step_size=horizon,

verbose=1,

prediction_intervals=PredictionIntervals(h=horizon,

method='conformal_distribution'),

level=[90],

refit=True)

In the following way, I load and retrain the model:

nf2 = NeuralForecast.load(path='./quarto_modello/test_run/')

nf2.fit(df=final_series[(final_series.ds >= train_start) &

(final_series.ds <= '2024-12-31 23:00:00')],

val_size = 24*7,

use_init_models=True

# prediction_intervals=PredictionIntervals(#n_windows= 2,

#                                         h=24*7,

#                                         method='conformal_error'))

I can successfully run

.fit()

when I don’t use

prediction_intervals

, but when I try to include it, the training process stops at val_check_steps with the following error message: RuntimeError: Early stopping conditioned on metric ptl/val_loss which is not available. Pass in or modify your EarlyStopping callback to use any of the following: train_loss, train_loss_step, train_loss_epoch. I need to compute the confidence interval, so I want to perform training with prediction_intervals. Thanks in advance to anyone who can help

田口天晴

02/19/2025, 6:35 AM

Hello community, I have a question about using iTransformer. Is there a way to visualize the transition of the loss function for each epoch, such as with a graph?

Stephany Paredes

02/19/2025, 1:15 PM

Hi again! I was wondering about the Auto Models and the refit=True setting. And i am wondering which of these is true: 1. the hyperparameter tuning will be made on

num_samples

amount of times on each validation window. So if i have 90 windows, the AutoModel will try to find the best hyper parameter

num_samples * n_windows

? Or 2. Is it done in such as way that it is a nested implementation? I.e. The refit is done in all windows, but the the best hyperparameters are searched after the cross validation is complete and whichever set worked best in all the windows? I guess it is the second? But wanted to double check.

Aravind Karunakaran

02/20/2025, 11:35 AM

Hey, so lets say I'm forecasting some sales based on biweekly data (the timestamps are the end of the 2 weeks) and I have a column in the dataframe called "holidays_in_2_weeks", which basically is the count of the holidays in that time span of every entry. Would I pass this as a hist_exog_list or a futr_exog_list? And can you give me reasoning too? Thanks!

Joaquin FERNANDEZ

02/21/2025, 8:56 AM

Hello I need to do some forecast in which I do not have data of today when I need to forecast tomorrow. To do so I foresee two options: 1. forecast two days and keep the second one. 2. push yesterday’s forecast as today’s data and forecast 1 day. The first option is easily done in NF Cross validation but it is computationally heavy. Is there a way to do the second option in the current stake? Any ideas? Best and thanks for your work.

田口天晴

02/24/2025, 9:53 AM

無題

田口天晴

02/24/2025, 9:53 AM

Hello. I have a question about the iTransformer in neuralforecast. When I use the transformer with the same data, I always encounter the following error at the same point. Upon checking, it turns out that the values in a 32×90×32 tensor are all NaN. What could be the possible reasons for this occurring? Especially, I want to know what kind of things in _base_multivariate.py.

Amin A

02/24/2025, 5:41 PM

Hello, In NHITs, how can I specify the number of cpus to use for training?

Zac Pullar-Strecker

02/24/2025, 8:37 PM

Hey all, I'm interested in doing multivariate->multivariate forecasting. At least at first primarily with TSMixerx. I saw this thread that provided some advice (removing

insample_y

), but I assume there's a decent amount in the nf/dataset/loss side that would need to be modified to support multiple targets as well. I guess I'd love to hear how difficult you think this modification would be and any tips for diving into it.

Marco

02/28/2025, 4:05 PM

Hello #C031M8RLC66! We just made a new release of neuralforecast with some pretty important changes: New features - New model: TimeXer, a Transformer-based model specifically designed to handle exogenous features - All losses compatible with all types of models (e.g. univariate/multivariate, direct/recurrent) OR appropriate protection added. - DistributionLoss now supports the use of

quantiles

predict

, allowing for easy quantile retrieval for all

DistributionLosses

. - Mixture losses (GMM, PMM and NBMM) now support learned weights for weighted mixture distribution outputs. - Mixture losses now support the use of

quantiles

predict

, allowing for easy quantile retrieval. - Improved stability of

ISQF

by adding softplus protection around some parameters instead of using

.abs

. - Unified API for any quantile or any confidence level during predict for both point and distribution losses. Enhancements - Improve docstrings of all models - Minor bug fix in TFT: we can omit specifying an RNN type and the static covariate encoder will still work. - Fitting with an invalid validation size now print a nice error message - Add bfloat16 support - Recurrent models can now produce forecasts recursively or directly. - IQLoss now gives monotonic quantiles - MASE loss now works Breaking Changes - Unify API - RMoK uses the

revin_affine

parameter instead of

revine_affine

. This was a typo in the previous version. - All models now inherit the

BaseModel

class. This changes how we implement new models in neuralforecast. - Recurrent models now require an

input_size

parameter. -

TCN

and

DRNN

are now window models, not recurrent models - We cannot load a recurrent model from a previous version to v3.0.0 Bug Fixes - Multivariate models do not error when predicting when

n_series

batch_size

- Insample prediction works with series of varying lengths Documentation - Big overhaul of the documentation to remove old and deprecated code. - Add example of modifying the default

configure_optimizers()

behavior (use of

ReduceLROnPlateau

scheduler) This release solves many of your pain points and it adds features that were aksed for a long time. Big thanks to @Olivier for his amazing contribution to this release, as well as to all our users for taking the time to raise issues and ask questions. We'll keep working on improving neuralforecast!

❤️ 2

🙌 7

Md Atiqur Rahaman

02/28/2025, 6:41 PM

I have some questions: 1. Can we save the cross validation model which does fit and predict? My point i want to use the fitted model to predict on new data. I know that we can just use fit() and then predict() but cross validation is super useful considering it can be trained on train set, and val and test set can also be given. So if the fitted model of cross validation can be done it would be great 2. As far as I can see from the documentation is that we can’t predict all of test df if horizon in lower than that. It can’t do the rolling prediction and we need to do it manually right? I saw that it takes much time for my work. Suppose I have train data with 20000 rows and has historical exog and future exog. So for the test set of 5000 rows i have to give futr dr and test data accordingly as i understand that predict uses the last date of dataframe given to predict. Is that correct assumption? And we have to give a rolling window of horizon for both test data and futr dr to the predict method, correct? Is there any other way? 3. Another thing is as we can use robust scaler or standard scaler do we need to inverscaling? Or nixtla does it by itself? Seeing that predict() or cross validation() returns dataframe can we see what data it returns? And what model weights it has?

田口天晴

03/07/2025, 2:49 PM

Hello, community ! How is the number of epochs determined in iTransformer?

田口天晴

03/08/2025, 7:09 AM

and, also I use MSE as loss function. But, error occurs and this is its logs. tell me how to solve this problem.

無題

Tony Gottschalg

03/14/2025, 10:20 AM

Hey Everyone, I have a question regarding an implementation for sample weighting inside the NixtlaLoss for NeuralForecaster (or rather the neural models). Let me clarify what I mean by this: 1. Introducing a weight for each observation coming from a specific unique ID; 2. The weight is applied in the loss function calculation such that losses for a specific unique ID are more strongly taken into account, then losses for other data points coming from a different unique id Why? When training a "global" model in the sense that it is trained on multiple different time series, I want to be able to put more emphasize on a certain target ID during training. I already implemented oversampling, which should be equivalent, but introducing a sample weight would give one more control. What I already tried / I am aware of I need to introduce a custom loss via the BasePointLoss class, but the main issue I see is that I don't see how to pass a weight tensor to this loss function, as the model fit values don't accept additional arguments (which could be hacky anyway given how the input is likely transformed into batches). I also thought about utilizing the mask argument, but since it is also internally set in the BaseModel class, I don't see how I could utilize it. I'm also aware of sample weighting for MLForecaster models but I would like to enable this for NeuralForecaster models in our use case. Has anyone an idea how to enable this (without having to modify the source code)? Thank you very much in advance and please tell me if something is unclear.

Bersu T

03/18/2025, 11:31 AM

Copy code

modelLSTM = AutoLSTM(h=h,
                     loss=MAE(),
                     backend='optuna',
                     num_samples=10) 

nf = NeuralForecast(models=[modelLSTM], freq='ME')
nf.fit(df=df_encoded, val_size=18)

Hi. When I do this I get an error stating Exception: Time series is too short for training, consider setting a smaller input size or set start_padding_enabled=True after running for a while, where are we expected to put the argument set_padding_enabled?

Bersu T

03/18/2025, 12:31 PM

Also another question, when trying to update the configuration as following

Copy code

lstm_config = AutoLSTM.get_default_config(h = h, backend="optuna")
def config_lstm(trial):
    config = {**lstm_config(trial)}
    config.update({
                   "input_size": trial.suggest_int("input_size", 2, 18),
                   })
    return config
lstm_config

Copy code

modelLSTM = AutoLSTM(h=h,
                     config=config_lstm,
                     backend='optuna',
                     loss=MAE(),
                     num_samples=3)

During fitting I get the following error raise: ValueError("Cannot set different distribution kind to the same parameter name.") ValueError: Cannot set different distribution kind to the same parameter name. [W 2025-03-18 122842,864] Trial 0 failed with value None.

Sapna Mishra

03/20/2025, 10:18 PM

Hello Nixtla Team, I hope you are doing well. I would like to find out if any of the following models support cross-learning when multiple time series are passed in long format: NBEATSx, NHITs, TSMixerx, TIDE, BiTCN, LSTM, and RNN. If they do, could you please let me know how to disable that feature? Is there a parameter or any mechanism that can help to turn down that feature? Thank you! Best regards, Sapna

Ankit Hemant Lade

03/21/2025, 2:31 PM

Hello @Marco does nixtla support feature importance for TiDE?

Aditya Limaye

03/22/2025, 12:50 AM

Question: do NeuralForecast models objects take into account "past" values of future exogenous features? In the training dataframe (

df

), I have values of the future_exogenous_cols for datetimes in the past, so the model has access to these values in the training pass, and at inference time, I include the

future_exogenous_cols

in the "past" dataframe (

df

) when i call

nf.predict()

-- but is the model actually using these values? thanks in advance!

👀 2

Ankit Hemant Lade

03/24/2025, 11:07 PM

Hey @Marco @Olivier to generate the feature_importances for TFT do we need to create predictions everytime?

Sapna Mishra

03/25/2025, 11:23 PM

Hey Team— For the TiDE and other neuralforecast model, is there a way to show the importance of each of the exog features passed into the model? Thanks!

Bersu T

03/26/2025, 8:33 AM

Hi, I have a question regarding training times. NeuralForecast is supposed to train global models, but even after simplifying the model significantly—using only 2

num_samples

and selecting just 4 out of the 176 unique IDs—the training still takes a very long time (about 30 minutes). This becomes even more problematic with the complete dataset. In contrast when using MLForecast training is significantly faster taking only a few seconds. Could you please clarify why this happens and what I could do to mitigate this?

Jelte Bottema

03/26/2025, 1:05 PM

Hi guys, about the feature importance for the NHITS model, is this on the roadmap(or maybe already there but I missed it), and how does the new model TimeXer compares to NHITS

Sarah Unterseher

03/27/2025, 3:50 PM

Hi everyone, I have a question where I can't get any further with the documentation. My training data set consists of three columns: unique_id, ds and y. I have 192 rows per unique_id and I want to pass 96 as input_size and 96 as horizon to my model. It looks like this: lstm_config = AutoLSTM.get_default_config(h=96, backend="ray") lstm_config["input_size"] = 96 lstm_config["context_size"] = 96 levels = [80, 90] model = AutoLSTM(h=96, loss=MQLoss(level=[80, 90]), config=lstm_config, gpus=1, search_alg=HyperOptSearch(), backend='ray', num_samples=32) loaded_nf = NeuralForecast(models=[model], freq='15min') train_data, test_data = load_and_preprocess_data(file_path) loaded_nf.fit(df=train_data, val_size=96) With this setup: I get the error 'No window available for training', which I don't understand, as there are exactly the right number of lines per unique_id for input_size + horizon. I have now realised that I can prevent the error if I set the parameter 'start_padding_enabled' to True. I could actually be happy with this, but I'm worried that any padding that is carried out will severely degrade my training data. I therefore have the following question: Why do I have to set the parameter 'start_padding_enabled' to True in my setup for it to work and what might be padded here?

Bersu T

03/31/2025, 10:15 AM

Hi, I'm using NHITS with a configuration that performs well (tuned via Optuna). When I run cross-validation without prediction intervals, the results look good. However, as soon as I add prediction intervals, I have to set refit=True. At that point the model performance drops drastically, like the forecasts become flat lines even though I'm using the exact same config. Why is this and what can I do to mitigate this?

Jonghyun Yun

04/09/2025, 5:14 PM

Hi Team, I have multiple time series with different scale and different seasonality. Which neural-forecast will be ideal for this purpose?

Raj Puneeth

04/10/2025, 9:19 PM

Hi Team, I'm new to neuralforecast, am trying to do a POC on how effective global models are for our use-case, I have around 400 TS in my POC at monthly grain ~10 yrs of history for most. The scale is large(50K to 100K) for a small set of time series and the remaining TS scale ranges from 100s to a few k's. I'm exploring nbeats and nbeatsx(with statics exogs only) and below are a few things I have tried to improve the performance against a baseline of robust ensemble method using multiple statistical models. The results are promising and are on par with baseline especially nbeats/x are doing well with the trend. The issue I'm having is with seasonality. The TS I have are quite volatile and the seasonality produced by nbeats is very muted even in cases where seasonality is consistent and very evident to naked eye and these leads to poor performance especially on those TS with larger scale. Thing I tried as follows: I'm using Optuna for hparam tuning, TPE sampler Optimizer: adamw seems to work well loss:huberMQloss, using median loss with 5 fold cv, delta I've tried range of values between 0 to 1 and a few other values like 5,10, etc.. Normalization: revin and minimax were helpful in improving accuracy Stacks: Seeing Improved performance with Trend and seasonality stacks so only sticking with those. mlp units: range from 32 to 256 MLP units per layer ( 1 to 5 layers) per block,(3 to 5 layers with 32 units) are picked mostly by optuna. no blocks: range from 2 to 8, 4 to 7 mostly picked in tuning. no stacks: range from 1 to 6(used Identity stack as the last stack for odd numbered stacks). 4 and above mostly picked by optuna harmonics: range from 2 to 18(optuna picks 10 and above most times) poly: 1 to 3, backast_length: 2x is mostly picked by optuna for a 12 month forecast horizon. batch_size: tried 32, 64, 128 max_steps: 500 shared_weight: True is picked by optuna mostly dropout_prob_theta: errors out, don't think the param is implemented for nbeats/x? Let me know what else I can try to improve seasonality? or improve generalization? Should I try any other models? Thanks in Advance!!

Jan

04/11/2025, 11:34 PM

I have a question about how I should be thinking about

step_size

when using the LSTM. Say I need to predict the next 24 hours every hour and I want to use the last 48 hours to do so, and I have future exogenous features that change every hour (for example weather forecasts), and turn into actuals when the time passes beyond the present. My data frame right now consists of non-overlapping windows of 72 steps long, where the first 48 steps are mostly duplicates, as the actual values of the exogenous features changes only one step at the time. So I'm basically using

input_size=48

horizon=24

and

step_size=72

when training an LSTM. However, I'm not sure that I'm doing this right as it seems like the model trains very poorly even though there's a lot of data (for example, the forecasted values rarely start from the last known values), and the predictions on a future hold-out set are very poor. Am I doing the windowing correctly? Or should I be feeding only 25 hour windows to the model (so

input_size=1

horizon=24

and

step_size=25

) where the first row are the latest actuals and have the LSTM do the tracking of the past? And is this different for other architectures such as NHITS?