https://github.com/nixtla logo
Join Slack
Powered by
# neural-forecast
  • z

    Zac Pullar-Strecker

    02/24/2025, 8:37 PM
    Hey all, I'm interested in doing multivariate->multivariate forecasting. At least at first primarily with TSMixerx. I saw this thread that provided some advice (removing
    insample_y
    ), but I assume there's a decent amount in the nf/dataset/loss side that would need to be modified to support multiple targets as well. I guess I'd love to hear how difficult you think this modification would be and any tips for diving into it.
    o
    • 2
    • 14
  • m

    Marco

    02/28/2025, 4:05 PM
    Hello #C031M8RLC66! We just made a new release of neuralforecast with some pretty important changes: New features - New model: TimeXer, a Transformer-based model specifically designed to handle exogenous features - All losses compatible with all types of models (e.g. univariate/multivariate, direct/recurrent) OR appropriate protection added. - DistributionLoss now supports the use of
    quantiles
    in
    predict
    , allowing for easy quantile retrieval for all
    DistributionLosses
    . - Mixture losses (GMM, PMM and NBMM) now support learned weights for weighted mixture distribution outputs. - Mixture losses now support the use of
    quantiles
    in
    predict
    , allowing for easy quantile retrieval. - Improved stability of
    ISQF
    by adding softplus protection around some parameters instead of using
    .abs
    . - Unified API for any quantile or any confidence level during predict for both point and distribution losses. Enhancements - Improve docstrings of all models - Minor bug fix in TFT: we can omit specifying an RNN type and the static covariate encoder will still work. - Fitting with an invalid validation size now print a nice error message - Add bfloat16 support - Recurrent models can now produce forecasts recursively or directly. - IQLoss now gives monotonic quantiles - MASE loss now works Breaking Changes - Unify API - RMoK uses the
    revin_affine
    parameter instead of
    revine_affine
    . This was a typo in the previous version. - All models now inherit the
    BaseModel
    class. This changes how we implement new models in neuralforecast. - Recurrent models now require an
    input_size
    parameter. -
    TCN
    and
    DRNN
    are now window models, not recurrent models - We cannot load a recurrent model from a previous version to v3.0.0 Bug Fixes - Multivariate models do not error when predicting when
    n_series
    >
    batch_size
    - Insample prediction works with series of varying lengths Documentation - Big overhaul of the documentation to remove old and deprecated code. - Add example of modifying the default
    configure_optimizers()
    behavior (use of
    ReduceLROnPlateau
    scheduler) This release solves many of your pain points and it adds features that were aksed for a long time. Big thanks to @Olivier for his amazing contribution to this release, as well as to all our users for taking the time to raise issues and ask questions. We'll keep working on improving neuralforecast!
    ❤️ 2
    🙌 7
    t
    o
    • 3
    • 2
  • m

    Md Atiqur Rahaman

    02/28/2025, 6:41 PM
    I have some questions: 1. Can we save the cross validation model which does fit and predict? My point i want to use the fitted model to predict on new data. I know that we can just use fit() and then predict() but cross validation is super useful considering it can be trained on train set, and val and test set can also be given. So if the fitted model of cross validation can be done it would be great 2. As far as I can see from the documentation is that we can’t predict all of test df if horizon in lower than that. It can’t do the rolling prediction and we need to do it manually right? I saw that it takes much time for my work. Suppose I have train data with 20000 rows and has historical exog and future exog. So for the test set of 5000 rows i have to give futr dr and test data accordingly as i understand that predict uses the last date of dataframe given to predict. Is that correct assumption? And we have to give a rolling window of horizon for both test data and futr dr to the predict method, correct? Is there any other way? 3. Another thing is as we can use robust scaler or standard scaler do we need to inverscaling? Or nixtla does it by itself? Seeing that predict() or cross validation() returns dataframe can we see what data it returns? And what model weights it has?
    m
    • 2
    • 3
  • u

    田口天晴

    03/07/2025, 2:49 PM
    Hello, community ! How is the number of epochs determined in iTransformer?
    o
    • 2
    • 2
  • u

    田口天晴

    03/08/2025, 7:09 AM
    and, also I use MSE as loss function. But, error occurs and this is its logs. tell me how to solve this problem.
    無題
    o
    • 2
    • 2
  • t

    Tony Gottschalg

    03/14/2025, 10:20 AM
    Hey Everyone, I have a question regarding an implementation for sample weighting inside the NixtlaLoss for NeuralForecaster (or rather the neural models). Let me clarify what I mean by this: 1. Introducing a weight for each observation coming from a specific unique ID; 2. The weight is applied in the loss function calculation such that losses for a specific unique ID are more strongly taken into account, then losses for other data points coming from a different unique id Why? When training a "global" model in the sense that it is trained on multiple different time series, I want to be able to put more emphasize on a certain target ID during training. I already implemented oversampling, which should be equivalent, but introducing a sample weight would give one more control. What I already tried / I am aware of I need to introduce a custom loss via the BasePointLoss class, but the main issue I see is that I don't see how to pass a weight tensor to this loss function, as the model fit values don't accept additional arguments (which could be hacky anyway given how the input is likely transformed into batches). I also thought about utilizing the mask argument, but since it is also internally set in the BaseModel class, I don't see how I could utilize it. I'm also aware of sample weighting for MLForecaster models but I would like to enable this for NeuralForecaster models in our use case. Has anyone an idea how to enable this (without having to modify the source code)? Thank you very much in advance and please tell me if something is unclear.
    o
    • 2
    • 2
  • b

    Bersu T

    03/18/2025, 11:31 AM
    Copy code
    modelLSTM = AutoLSTM(h=h,
                         loss=MAE(),
                         backend='optuna',
                         num_samples=10) 
    
    nf = NeuralForecast(models=[modelLSTM], freq='ME')
    nf.fit(df=df_encoded, val_size=18)
    Hi. When I do this I get an error stating Exception: Time series is too short for training, consider setting a smaller input size or set start_padding_enabled=True after running for a while, where are we expected to put the argument set_padding_enabled?
    m
    • 2
    • 6
  • b

    Bersu T

    03/18/2025, 12:31 PM
    Also another question, when trying to update the configuration as following
    Copy code
    lstm_config = AutoLSTM.get_default_config(h = h, backend="optuna")
    def config_lstm(trial):
        config = {**lstm_config(trial)}
        config.update({
                       "input_size": trial.suggest_int("input_size", 2, 18),
                       })
        return config
    lstm_config
    Copy code
    modelLSTM = AutoLSTM(h=h,
                         config=config_lstm,
                         backend='optuna',
                         loss=MAE(),
                         num_samples=3)
    During fitting I get the following error raise: ValueError("Cannot set different distribution kind to the same parameter name.") ValueError: Cannot set different distribution kind to the same parameter name. [W 2025-03-18 122842,864] Trial 0 failed with value None.
    m
    • 2
    • 1
  • s

    Sapna Mishra

    03/20/2025, 10:18 PM
    Hello Nixtla Team, I hope you are doing well. I would like to find out if any of the following models support cross-learning when multiple time series are passed in long format: NBEATSx, NHITs, TSMixerx, TIDE, BiTCN, LSTM, and RNN. If they do, could you please let me know how to disable that feature? Is there a parameter or any mechanism that can help to turn down that feature? Thank you! Best regards, Sapna
    o
    j
    • 3
    • 16
  • a

    Ankit Hemant Lade

    03/21/2025, 2:31 PM
    Hello @Marco does nixtla support feature importance for TiDE?
    o
    m
    • 3
    • 4
  • a

    Aditya Limaye

    03/22/2025, 12:50 AM
    Question: do NeuralForecast models objects take into account "past" values of future exogenous features? In the training dataframe (
    df
    ), I have values of the future_exogenous_cols for datetimes in the past, so the model has access to these values in the training pass, and at inference time, I include the
    future_exogenous_cols
    in the "past" dataframe (
    df
    ) when i call
    nf.predict()
    -- but is the model actually using these values? thanks in advance!
    👀 2
    m
    • 2
    • 4
  • a

    Ankit Hemant Lade

    03/24/2025, 11:07 PM
    Hey @Marco @Olivier to generate the feature_importances for TFT do we need to create predictions everytime?
    m
    o
    • 3
    • 17
  • s

    Sapna Mishra

    03/25/2025, 11:23 PM
    Hey Team— For the TiDE and other neuralforecast model, is there a way to show the importance of each of the exog features passed into the model? Thanks!
    o
    • 2
    • 1
  • b

    Bersu T

    03/26/2025, 8:33 AM
    Hi, I have a question regarding training times. NeuralForecast is supposed to train global models, but even after simplifying the model significantly—using only 2
    num_samples
    and selecting just 4 out of the 176 unique IDs—the training still takes a very long time (about 30 minutes). This becomes even more problematic with the complete dataset. In contrast when using MLForecast training is significantly faster taking only a few seconds. Could you please clarify why this happens and what I could do to mitigate this?
    j
    • 2
    • 3
  • j

    Jelte Bottema

    03/26/2025, 1:05 PM
    Hi guys, about the feature importance for the NHITS model, is this on the roadmap(or maybe already there but I missed it), and how does the new model TimeXer compares to NHITS
    m
    a
    • 3
    • 5
  • s

    Sarah Unterseher

    03/27/2025, 3:50 PM
    Hi everyone, I have a question where I can't get any further with the documentation. My training data set consists of three columns: unique_id, ds and y. I have 192 rows per unique_id and I want to pass 96 as input_size and 96 as horizon to my model. It looks like this: lstm_config = AutoLSTM.get_default_config(h=96, backend="ray") lstm_config["input_size"] = 96 lstm_config["context_size"] = 96 levels = [80, 90] model = AutoLSTM(h=96, loss=MQLoss(level=[80, 90]), config=lstm_config, gpus=1, search_alg=HyperOptSearch(), backend='ray', num_samples=32) loaded_nf = NeuralForecast(models=[model], freq='15min') train_data, test_data = load_and_preprocess_data(file_path) loaded_nf.fit(df=train_data, val_size=96) With this setup: I get the error 'No window available for training', which I don't understand, as there are exactly the right number of lines per unique_id for input_size + horizon. I have now realised that I can prevent the error if I set the parameter 'start_padding_enabled' to True. I could actually be happy with this, but I'm worried that any padding that is carried out will severely degrade my training data. I therefore have the following question: Why do I have to set the parameter 'start_padding_enabled' to True in my setup for it to work and what might be padded here?
    m
    • 2
    • 6
  • b

    Bersu T

    03/31/2025, 10:15 AM
    Hi, I'm using NHITS with a configuration that performs well (tuned via Optuna). When I run cross-validation without prediction intervals, the results look good. However, as soon as I add prediction intervals, I have to set refit=True. At that point the model performance drops drastically, like the forecasts become flat lines even though I'm using the exact same config. Why is this and what can I do to mitigate this?
    m
    • 2
    • 5
  • j

    Jonghyun Yun

    04/09/2025, 5:14 PM
    Hi Team, I have multiple time series with different scale and different seasonality. Which neural-forecast will be ideal for this purpose?
    m
    • 2
    • 3
  • r

    Raj Puneeth

    04/10/2025, 9:19 PM
    Hi Team, I'm new to neuralforecast, am trying to do a POC on how effective global models are for our use-case, I have around 400 TS in my POC at monthly grain ~10 yrs of history for most. The scale is large(50K to 100K) for a small set of time series and the remaining TS scale ranges from 100s to a few k's. I'm exploring nbeats and nbeatsx(with statics exogs only) and below are a few things I have tried to improve the performance against a baseline of robust ensemble method using multiple statistical models. The results are promising and are on par with baseline especially nbeats/x are doing well with the trend. The issue I'm having is with seasonality. The TS I have are quite volatile and the seasonality produced by nbeats is very muted even in cases where seasonality is consistent and very evident to naked eye and these leads to poor performance especially on those TS with larger scale. Thing I tried as follows: I'm using Optuna for hparam tuning, TPE sampler Optimizer: adamw seems to work well loss:huberMQloss, using median loss with 5 fold cv, delta I've tried range of values between 0 to 1 and a few other values like 5,10, etc.. Normalization: revin and minimax were helpful in improving accuracy Stacks: Seeing Improved performance with Trend and seasonality stacks so only sticking with those. mlp units: range from 32 to 256 MLP units per layer ( 1 to 5 layers) per block,(3 to 5 layers with 32 units) are picked mostly by optuna. no blocks: range from 2 to 8, 4 to 7 mostly picked in tuning. no stacks: range from 1 to 6(used Identity stack as the last stack for odd numbered stacks). 4 and above mostly picked by optuna harmonics: range from 2 to 18(optuna picks 10 and above most times) poly: 1 to 3, backast_length: 2x is mostly picked by optuna for a 12 month forecast horizon. batch_size: tried 32, 64, 128 max_steps: 500 shared_weight: True is picked by optuna mostly dropout_prob_theta: errors out, don't think the param is implemented for nbeats/x? Let me know what else I can try to improve seasonality? or improve generalization? Should I try any other models? Thanks in Advance!!
    m
    • 2
    • 2
  • j

    Jan

    04/11/2025, 11:34 PM
    I have a question about how I should be thinking about
    step_size
    when using the LSTM. Say I need to predict the next 24 hours every hour and I want to use the last 48 hours to do so, and I have future exogenous features that change every hour (for example weather forecasts), and turn into actuals when the time passes beyond the present. My data frame right now consists of non-overlapping windows of 72 steps long, where the first 48 steps are mostly duplicates, as the actual values of the exogenous features changes only one step at the time. So I'm basically using
    input_size=48
    ,
    horizon=24
    and
    step_size=72
    when training an LSTM. However, I'm not sure that I'm doing this right as it seems like the model trains very poorly even though there's a lot of data (for example, the forecasted values rarely start from the last known values), and the predictions on a future hold-out set are very poor. Am I doing the windowing correctly? Or should I be feeding only 25 hour windows to the model (so
    input_size=1
    ,
    horizon=24
    and
    step_size=25
    ) where the first row are the latest actuals and have the LSTM do the tracking of the past? And is this different for other architectures such as NHITS?
    m
    t
    • 3
    • 10
  • b

    Bersu T

    04/15/2025, 7:59 AM
    Can I add time series specific features to NN configurations, like with ML models, or do NNs mostly depend on their own architecture and hyperparameters to learn those patterns?
    j
    • 2
    • 1
  • j

    Jelte Bottema

    04/15/2025, 11:21 AM
    Hi, I would like to run my NHITS model which is running on a CPU, now run on a GPU. Are there settings I need to adjust or other things to think/worry about.
    m
    • 2
    • 1
  • c

    Christiaan

    04/22/2025, 7:48 AM
    Hi I have a question if it is possible to have implicit quantile networks in neural forecast or implement it myself without too much hassle. The idea is that I use a (custom) distribution to sample tau, our quantile level, multiply it with a cosine function, feed it into a linear embedding, feed it into an activation function and then concatenate it to my input sequences. The loss function is a quantile or expectile loss with tau as the quantile level. Why do I want it, it's much more parameter efficient. Lets Assume I have an lstm model, where I forecast a week of hourly values ahead, that's 168 values. I want to have 10 to 20 quantiles. With normal mq loss it will blow up my network parameters. With this it won't. Then during runtime I want to conformalize each of these quantiles using a separate calibration set, that I update using the test set. If this is possible with nixtla I can use your incredibly optimized codebase and don't even need a GPU. Thanks in advance.
    o
    • 2
    • 7
  • b

    Bethany Earnest

    04/23/2025, 4:09 PM
    Is there a built-in callback that can automatically save the best-performing model checkpoints during long training runs? Looking for something like ModelCheckpoint in tensorflow
    j
    • 2
    • 2
  • j

    Jonathan Mackenzie

    04/24/2025, 4:49 AM
    when training a NHIST model via
    nf.core.NeuralForecast.fit()
    , is there a reason we cannot set the size of the test set?
    m
    • 2
    • 3
  • r

    Renan Avila

    04/24/2025, 10:37 PM
    Hello! First of all, thanks for supporting open source, your library is awesome. Secondly, I am facing a GPU out of memory issue, I will describe the issue and would appreciate any help. I am trying to prevent the need for GPU scaling. Setup: • I am using nf module with 8 automodels on initialization (almost all of the univariate ones with historical exogenous features capabilities according to the forecasting models doc page). • I am using optuna as backend for hyperparameter optimization and mostly default configs, except for hist_exog, early_stop_patience_steps, input_size, val_check_steps and batch_size, which I provide as fixed values added to default configs. • I get the 4 ETT full data sets from datasets forecast longhorizon2 and pivot them to use the exogenous features as columns instead of using them in raw format as different time series before feeding it to nf. At least this is what I believe as the correct way to handle historical exogenous features from investigating the docs. • I also vertically duplicate the size of the dataset adding another time series which is a result from a data augmentation method over the previous original datasets, I add it as a different time series with a separate unique_id. Running: • With horizon set to 24 and input_size 72, the 22GB RAM from L4 GPU are enough for running all the models across all the 4 datasets with cross validation exactly as in the docs and that's perfect. • When using horizon 96 and input size 96, the 22GB RAM from L4 GPU are not enough anymore for ETTm datasets, which are larger than ETTh datasets in number of data points. ETTh datasets still run fine. And it seems it works for some models before crashing (observing nvtop) with probably a larger model such as TFT (not sure what one since nf training logs). Solving trials: • First, I tried to reduce the batch size, but it did not help. Since some datasets work and others not, it is most probably related to handling the size of the datasets into GPU memory. • Second, I followed the "large dataset handling" doc page and preprocessed the datasets generating a parquet in the specified folder structure for each of the 2 unique_id time series within each ETT dataset. ◦ Then I noticed that the cross_validation from nf module is not compatible with files_list as df parameter, only the fit method is. ◦ Then I decided to implement the cross_validation outside from nf module using fit and predict methods. I generated prediction windows for the test dataset (previously separated from train and validation large dataset), and provided it to nf.predict as a full df and not as files_list as I understood from the docs. But processing them sequentially seems to take a lot of time, since I am using step_size value as 1 and the test data set has 2000 data points. ◦ So I needed a way to process the window using the GPU, but the nf.predict method does not provide step_size parameter, so one solution I found is to iterate nf.models and run model.predict with the step_size and test_size (apparently unused parameter inside model.predict) parameters specified, after all it is needed to group results in order to evaluate. • Third, inspired by the previous experience of "large dataset handling", I decided to use the large dataset mode to nf.fit the models and then run the nf.cross_validation with the previously separated smaller test set only, instead of the full dataset, this way I could hope the GPU memory used during cross_validation would decrease. ◦ But the problem is that, as of my understanding, the nf.cross_validation forces the models to be fit again both using the refit parameter as false (once) and as true (once for each window). This would be some kind of transfer learning since I already used the nf.fit before, what makes me think that a flag for preventing any additional training if the internal variable _fitted is True could be a solution. Maybe I could try to develop it and submit a PR. Am I using the library as it is intended to? What is the suggested approach in this case? Is it expected to face these GPU memory issues since I see on literature that horizons of up to around 800 are common in long horizon datasets, and the ETT datasets are the ones with the least amount of possible historical exogenous features.
    m
    • 2
    • 1
  • j

    Jonathan Mackenzie

    04/29/2025, 3:04 AM
    how can I make multiple predictions at once? It seems that calling predict with a dataframe with multiple rows only predicts on the last row:
    o
    r
    m
    • 4
    • 18
  • j

    Joaquin FERNANDEZ

    05/06/2025, 3:46 PM
    Hello. I’m using a machine with 4 gpus controlled via slurm jobs (it’s an hpc) When trying toto allocate large models I’m getting OOM errors on torch cuda memory because it’s trying to allocate everything to the first gpu. I see the four devices in the torch printouts. Is there a way to use multiple gpus on a single node? Best
  • r

    Rodrigo Sodré

    05/07/2025, 12:32 PM
    Greetings everybody! I'm studying the neuralforecast cross-validation tutorial: https://nixtlaverse.nixtla.io/neuralforecast/docs/capabilities/cross_validation.html I changed the Y_df not to be filtered and consider all observations (h1...h99) commenting the following cell:
    # Y_df = Y_df.query("unique_id == 'H1'")[:700]
    # Y_df
    Then i got the attached images. Does anyone knows what are those crossed lines?
    m
    • 2
    • 5
  • c

    Christiaan

    05/08/2025, 7:40 AM
    Hey maybe I've overlooked something but you guys don't have a rnn encoder with rnn decoder right? So Encode historic + categorical data with lstm_1 to obtain a h and c Run lstm_2 with initialized h and c from encoder other inputs future + categorical data. Possibility to mix encoded h into lstm 2 at the input too, perhaps with a horizon time embedding Possibility to use gru/lstm and recent slstm with improved gating
    o
    • 2
    • 2