Nixtla Community #neural-forecast

Aditya Limaye

03/22/2025, 12:50 AM

Question: do NeuralForecast models objects take into account "past" values of future exogenous features? In the training dataframe (

df

), I have values of the future_exogenous_cols for datetimes in the past, so the model has access to these values in the training pass, and at inference time, I include the

future_exogenous_cols

in the "past" dataframe (

df

) when i call

nf.predict()

-- but is the model actually using these values? thanks in advance!

👀 2

Ankit Hemant Lade

03/24/2025, 11:07 PM

Hey @Marco @Olivier to generate the feature_importances for TFT do we need to create predictions everytime?

Sapna Mishra

03/25/2025, 11:23 PM

Hey Team— For the TiDE and other neuralforecast model, is there a way to show the importance of each of the exog features passed into the model? Thanks!

Bersu T

03/26/2025, 8:33 AM

Hi, I have a question regarding training times. NeuralForecast is supposed to train global models, but even after simplifying the model significantly—using only 2

num_samples

and selecting just 4 out of the 176 unique IDs—the training still takes a very long time (about 30 minutes). This becomes even more problematic with the complete dataset. In contrast when using MLForecast training is significantly faster taking only a few seconds. Could you please clarify why this happens and what I could do to mitigate this?

Jelte Bottema

03/26/2025, 1:05 PM

Hi guys, about the feature importance for the NHITS model, is this on the roadmap(or maybe already there but I missed it), and how does the new model TimeXer compares to NHITS

Sarah Unterseher

03/27/2025, 3:50 PM

Hi everyone, I have a question where I can't get any further with the documentation. My training data set consists of three columns: unique_id, ds and y. I have 192 rows per unique_id and I want to pass 96 as input_size and 96 as horizon to my model. It looks like this: lstm_config = AutoLSTM.get_default_config(h=96, backend="ray") lstm_config["input_size"] = 96 lstm_config["context_size"] = 96 levels = [80, 90] model = AutoLSTM(h=96, loss=MQLoss(level=[80, 90]), config=lstm_config, gpus=1, search_alg=HyperOptSearch(), backend='ray', num_samples=32) loaded_nf = NeuralForecast(models=[model], freq='15min') train_data, test_data = load_and_preprocess_data(file_path) loaded_nf.fit(df=train_data, val_size=96) With this setup: I get the error 'No window available for training', which I don't understand, as there are exactly the right number of lines per unique_id for input_size + horizon. I have now realised that I can prevent the error if I set the parameter 'start_padding_enabled' to True. I could actually be happy with this, but I'm worried that any padding that is carried out will severely degrade my training data. I therefore have the following question: Why do I have to set the parameter 'start_padding_enabled' to True in my setup for it to work and what might be padded here?

Bersu T

03/31/2025, 10:15 AM

Hi, I'm using NHITS with a configuration that performs well (tuned via Optuna). When I run cross-validation without prediction intervals, the results look good. However, as soon as I add prediction intervals, I have to set refit=True. At that point the model performance drops drastically, like the forecasts become flat lines even though I'm using the exact same config. Why is this and what can I do to mitigate this?

Jonghyun Yun

04/09/2025, 5:14 PM

Hi Team, I have multiple time series with different scale and different seasonality. Which neural-forecast will be ideal for this purpose?

Raj Puneeth

04/10/2025, 9:19 PM

Hi Team, I'm new to neuralforecast, am trying to do a POC on how effective global models are for our use-case, I have around 400 TS in my POC at monthly grain ~10 yrs of history for most. The scale is large(50K to 100K) for a small set of time series and the remaining TS scale ranges from 100s to a few k's. I'm exploring nbeats and nbeatsx(with statics exogs only) and below are a few things I have tried to improve the performance against a baseline of robust ensemble method using multiple statistical models. The results are promising and are on par with baseline especially nbeats/x are doing well with the trend. The issue I'm having is with seasonality. The TS I have are quite volatile and the seasonality produced by nbeats is very muted even in cases where seasonality is consistent and very evident to naked eye and these leads to poor performance especially on those TS with larger scale. Thing I tried as follows: I'm using Optuna for hparam tuning, TPE sampler Optimizer: adamw seems to work well loss:huberMQloss, using median loss with 5 fold cv, delta I've tried range of values between 0 to 1 and a few other values like 5,10, etc.. Normalization: revin and minimax were helpful in improving accuracy Stacks: Seeing Improved performance with Trend and seasonality stacks so only sticking with those. mlp units: range from 32 to 256 MLP units per layer ( 1 to 5 layers) per block,(3 to 5 layers with 32 units) are picked mostly by optuna. no blocks: range from 2 to 8, 4 to 7 mostly picked in tuning. no stacks: range from 1 to 6(used Identity stack as the last stack for odd numbered stacks). 4 and above mostly picked by optuna harmonics: range from 2 to 18(optuna picks 10 and above most times) poly: 1 to 3, backast_length: 2x is mostly picked by optuna for a 12 month forecast horizon. batch_size: tried 32, 64, 128 max_steps: 500 shared_weight: True is picked by optuna mostly dropout_prob_theta: errors out, don't think the param is implemented for nbeats/x? Let me know what else I can try to improve seasonality? or improve generalization? Should I try any other models? Thanks in Advance!!

Jan

04/11/2025, 11:34 PM

I have a question about how I should be thinking about

step_size

when using the LSTM. Say I need to predict the next 24 hours every hour and I want to use the last 48 hours to do so, and I have future exogenous features that change every hour (for example weather forecasts), and turn into actuals when the time passes beyond the present. My data frame right now consists of non-overlapping windows of 72 steps long, where the first 48 steps are mostly duplicates, as the actual values of the exogenous features changes only one step at the time. So I'm basically using

input_size=48

horizon=24

and

step_size=72

when training an LSTM. However, I'm not sure that I'm doing this right as it seems like the model trains very poorly even though there's a lot of data (for example, the forecasted values rarely start from the last known values), and the predictions on a future hold-out set are very poor. Am I doing the windowing correctly? Or should I be feeding only 25 hour windows to the model (so

input_size=1

horizon=24

and

step_size=25

) where the first row are the latest actuals and have the LSTM do the tracking of the past? And is this different for other architectures such as NHITS?

Bersu T

04/15/2025, 7:59 AM

Can I add time series specific features to NN configurations, like with ML models, or do NNs mostly depend on their own architecture and hyperparameters to learn those patterns?

Jelte Bottema

04/15/2025, 11:21 AM

Hi, I would like to run my NHITS model which is running on a CPU, now run on a GPU. Are there settings I need to adjust or other things to think/worry about.

Christiaan

04/22/2025, 7:48 AM

Hi I have a question if it is possible to have implicit quantile networks in neural forecast or implement it myself without too much hassle. The idea is that I use a (custom) distribution to sample tau, our quantile level, multiply it with a cosine function, feed it into a linear embedding, feed it into an activation function and then concatenate it to my input sequences. The loss function is a quantile or expectile loss with tau as the quantile level. Why do I want it, it's much more parameter efficient. Lets Assume I have an lstm model, where I forecast a week of hourly values ahead, that's 168 values. I want to have 10 to 20 quantiles. With normal mq loss it will blow up my network parameters. With this it won't. Then during runtime I want to conformalize each of these quantiles using a separate calibration set, that I update using the test set. If this is possible with nixtla I can use your incredibly optimized codebase and don't even need a GPU. Thanks in advance.

Bethany Earnest

04/23/2025, 4:09 PM

Is there a built-in callback that can automatically save the best-performing model checkpoints during long training runs? Looking for something like ModelCheckpoint in tensorflow

Jonathan Mackenzie

04/24/2025, 4:49 AM

when training a NHIST model via

nf.core.NeuralForecast.fit()

, is there a reason we cannot set the size of the test set?

Renan Avila

04/24/2025, 10:37 PM

Hello! First of all, thanks for supporting open source, your library is awesome. Secondly, I am facing a GPU out of memory issue, I will describe the issue and would appreciate any help. I am trying to prevent the need for GPU scaling. Setup: • I am using nf module with 8 automodels on initialization (almost all of the univariate ones with historical exogenous features capabilities according to the forecasting models doc page). • I am using optuna as backend for hyperparameter optimization and mostly default configs, except for hist_exog, early_stop_patience_steps, input_size, val_check_steps and batch_size, which I provide as fixed values added to default configs. • I get the 4 ETT full data sets from datasets forecast longhorizon2 and pivot them to use the exogenous features as columns instead of using them in raw format as different time series before feeding it to nf. At least this is what I believe as the correct way to handle historical exogenous features from investigating the docs. • I also vertically duplicate the size of the dataset adding another time series which is a result from a data augmentation method over the previous original datasets, I add it as a different time series with a separate unique_id. Running: • With horizon set to 24 and input_size 72, the 22GB RAM from L4 GPU are enough for running all the models across all the 4 datasets with cross validation exactly as in the docs and that's perfect. • When using horizon 96 and input size 96, the 22GB RAM from L4 GPU are not enough anymore for ETTm datasets, which are larger than ETTh datasets in number of data points. ETTh datasets still run fine. And it seems it works for some models before crashing (observing nvtop) with probably a larger model such as TFT (not sure what one since nf training logs). Solving trials: • First, I tried to reduce the batch size, but it did not help. Since some datasets work and others not, it is most probably related to handling the size of the datasets into GPU memory. • Second, I followed the "large dataset handling" doc page and preprocessed the datasets generating a parquet in the specified folder structure for each of the 2 unique_id time series within each ETT dataset. ◦ Then I noticed that the cross_validation from nf module is not compatible with files_list as df parameter, only the fit method is. ◦ Then I decided to implement the cross_validation outside from nf module using fit and predict methods. I generated prediction windows for the test dataset (previously separated from train and validation large dataset), and provided it to nf.predict as a full df and not as files_list as I understood from the docs. But processing them sequentially seems to take a lot of time, since I am using step_size value as 1 and the test data set has 2000 data points. ◦ So I needed a way to process the window using the GPU, but the nf.predict method does not provide step_size parameter, so one solution I found is to iterate nf.models and run model.predict with the step_size and test_size (apparently unused parameter inside model.predict) parameters specified, after all it is needed to group results in order to evaluate. • Third, inspired by the previous experience of "large dataset handling", I decided to use the large dataset mode to nf.fit the models and then run the nf.cross_validation with the previously separated smaller test set only, instead of the full dataset, this way I could hope the GPU memory used during cross_validation would decrease. ◦ But the problem is that, as of my understanding, the nf.cross_validation forces the models to be fit again both using the refit parameter as false (once) and as true (once for each window). This would be some kind of transfer learning since I already used the nf.fit before, what makes me think that a flag for preventing any additional training if the internal variable _fitted is True could be a solution. Maybe I could try to develop it and submit a PR. Am I using the library as it is intended to? What is the suggested approach in this case? Is it expected to face these GPU memory issues since I see on literature that horizons of up to around 800 are common in long horizon datasets, and the ETT datasets are the ones with the least amount of possible historical exogenous features.

Jonathan Mackenzie

04/29/2025, 3:04 AM

how can I make multiple predictions at once? It seems that calling predict with a dataframe with multiple rows only predicts on the last row:

Joaquin FERNANDEZ

05/06/2025, 3:46 PM

Hello. I’m using a machine with 4 gpus controlled via slurm jobs (it’s an hpc) When trying toto allocate large models I’m getting OOM errors on torch cuda memory because it’s trying to allocate everything to the first gpu. I see the four devices in the torch printouts. Is there a way to use multiple gpus on a single node? Best

Rodrigo Sodré

05/07/2025, 12:32 PM

Greetings everybody! I'm studying the neuralforecast cross-validation tutorial: https://nixtlaverse.nixtla.io/neuralforecast/docs/capabilities/cross_validation.html I changed the Y_df not to be filtered and consider all observations (h1...h99) commenting the following cell:

# Y_df = Y_df.query("unique_id == 'H1'")[:700]

# Y_df

Then i got the attached images. Does anyone knows what are those crossed lines?

Christiaan

05/08/2025, 7:40 AM

Hey maybe I've overlooked something but you guys don't have a rnn encoder with rnn decoder right? So Encode historic + categorical data with lstm_1 to obtain a h and c Run lstm_2 with initialized h and c from encoder other inputs future + categorical data. Possibility to mix encoded h into lstm 2 at the input too, perhaps with a horizon time embedding Possibility to use gru/lstm and recent slstm with improved gating

Marco

05/13/2025, 6:26 PM

<!channel> New release of neuralforecast (v.3.0.1): Features • FEAT: Select basis functions in NBEATS • FEAT: Add flash-attention • FEAT: HuberIQLoss Bug Fixes • FIX: Fix iPython version • FIX: Recurrent predictions • FIX: Fix poor performance with the NegativeBinomial DistributionLoss • FIX: Add exclude_insample_y param to TimeXer for model loading • FIX: Set 2.0.0<=pytorch<=2.6.0 to avoid conflicts with networkx with Python 3.9 • FIX: Create windows once • FIX: Add h_train to RNNs & fix issue with input_size • FIX: Allow static vars only with NBEATSx and exogenous block Thank you to all of the contributors from the community! Make sure to install the latest version to access the latest features and fixes, and keep using neuralforecast and asking your questions here!

👏 2

🔥 3

🙌 10

Sai krishna Sirikonda

05/16/2025, 12:08 PM

Hello everyone, could anyone provide working documents on incorporating exogenous variables in hierarchical forecasting? Alternatively, any key points or guidance on how to work with them would also be appreciated.

Keyhan

05/16/2025, 5:42 PM

Hi everyone, I'm training an NHITS with futr_exog_var and the horizon=360hrs. I fully optimized the hyperparameters using AutoNHITS. Is there a way to use this model but only give it futr_df that goes out only 3hrs in the future and produce the next 3hrs forecast with it? I know one solution would be to train another model with horizon=3hr and use that but I can't do that. Since that would mean I would need 360/3=120 models to train which is not feasible.

Paul Skeie

05/21/2025, 1:01 PM

Hi everyone, I'm training NHITS and NBEATSx for a demand prediction task and I have some futr_exog_list features. I'm looking at 208 unique ids over a period of two and a half year and I'm predicting hourly values four weeks out. Both models perform pretty well on this task, but not during easter. Demand has peaks during easter and I have an easter feature, but I don't see much forecast improvement from that feature. I'm wondering whether moving holidays like easter is going to be hard to fit to a recurring pattern that these models can discover. Some unique ids have triple the normal demand during easter, so on the one hand it should be a robust signal, but on the other hand, the models have only seen two easters of the past. Is there anything I can do to help the model learn, or should I try some other models?

Rodrigo Sodré

05/22/2025, 12:46 AM

Greetings. I'm testing most nf automodels on my dataset and there's no winner, some times one models outperforms the others. Can anyone pls suggest an approach to choose which model should i use at some point? Smth like merging all forecasts in one dataframe and, for example, use random forests to tell which ones are closer to the real series (maybe weighted).

Jonathan Mackenzie

05/23/2025, 1:08 AM

When using an Auto model, why does it clobber my early_stop_patience_steps? https://github.com/Nixtla/neuralforecast/blob/main/neuralforecast/common/_base_auto.py#L140-L143

Jelte Bottema

05/24/2025, 1:35 PM

When using nhits and making predictions my outputs blow up with the infomation of making predicitons: Predictiing dataloader. As I make predictions for a full year based on earlier trained model. Is there a way to have no output when doing the predictions

Jelte Bottema

05/26/2025, 8:21 AM

Is there a problem using AutoNhits also for making your predictions. Or is better to load in the hyperparams from of AutoNhits and make the predictions with Nhits?

Tyler Nisonoff

05/27/2025, 6:23 PM

👋 Neuralforecast community! I opened a PR a week ago to add some performance optimizations for large datasets. These changes have been really valuable to my team, and we'd love to see them make their way into an official release! Let me know if I can do anything to help get it merged https://github.com/Nixtla/neuralforecast/pull/1335

Owen Chaffard

05/28/2025, 7:27 AM

Hi everyone ! I was wondering if anyone knew of an open-source version of the CLOVER hierarchical TS model, it seems partially based on neuralforecast. In particular I'm trying to reproduce their specific version of the MQCNN but i'm not sure I understand how the MLP decoder is used ( it's different than the traditional MQCNN since this is a multivariate version but the paper isn't exactly explicit regarding how)