This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

05/07/2023, 3:48 PM

This message was deleted.

Max (Nixtla)

05/08/2023, 8:01 PM

Thanks for the very detailed question. I think this is a great index for a tutorial that we should write. In the mean time I would recommend you the following. • Follow this End to End Walkthrough _Model training, evaluation and selection for multiple time series._ Some brief answers to your points. 1. a.) Nixtla does not support exploratory analysis per se. (Here is a tutorial using pandas profiling) TIP: Speak to your collegues from business and operations to find important categorical variables. For example, if there is a particular month where business ran a promotion you could try to create categortiable variables like promotion 1 or 0. The same for special things like covid. 1. b.) Short: don't use MAPE, maybe use MAE. a. Long Read: i. Forecast KPIs: RMSE, MAE, MAPE & Bias ii. Time Series Forecast Error Metrics You Should Know 2. Yes, start with statasforecast. Then follow: https://nixtla.github.io/statsforecast/examples/statisticalneuralmethods.html

i. Should I try it with all models and see the result with a crossval? Is it appropriate to tune these models? (I have not seen how to perform it in the documentation)

1.

Start with the auto models. These models find the best parameters for you. AutoTheta, AutoMSTL, etc,

Nasreddine D

05/09/2023, 1:26 PM

Hi Max, Thanks for the answer 🙂. It will help to start. I confirm that a full tutorial for handling a time series forecasting project using all Nixtla tools (and others) would be great and helpful. I did a first iteration with StatsForecast with Auto models, I got the first results (which seems good). I quickly tried pandas-profiling, it is very simple for now. I am going to follow this tuto now and see the results https://nixtla.github.io/statsforecast/examples/statisticalneuralmethods.html

fede (nixtla) (they/them)

05/09/2023, 6:44 PM

hey @Nasreddine D! Thanks for using the nixtlaverse. Complementing @Max (Nixtla), here are a couple of ideas regarding your questions: 1. a. Yes, the best practice is to start the pipeline with a proper eda. A good approach to finding trends and seasonalities would be using the MSTL model (https://nixtla.github.io/statsforecast/examples/multipleseasonalities.html) to decompose the series and find relevant patterns. Also, you could use tsfeatures (https://github.com/Nixtla/tsfeatures) to extract relevant information about your series (such as sparsity, entropy, and autocorrelation strength, among others). Currently, handling missing values is out of the scope of our libraries (we are working on that). But in this tutorial (https://github.com/Nixtla/statsforecast/blob/main/experiments/bigquery/src/statsforecast-fugue-citibikes-trips.ipynb) you can find an approach to filling them. b. The mape metric usually is extremely hard to judge (https://blog.blueyonder.com/mean-absolute-percentage-error-mape-has-served-its-duty-and-should-now-retire/). Scaled metrics such as mase or rmsse might be a better option. c. Usually, having more than one cross-validation window is a better practice. The exact number of windows depends on the quantity of your data, and your use case. If you have long time series and are willing to wait, we suggest using all the possible windows. 2. a. Yes, that’s a good approach. StatsForecast contains the most simple models (Naive, and SeasonalNaive), so after creating a benchmark with those models, you should start building more complex ones. b. In our experience, the automated models (AutoARIMA, AutoETS, AutoTheta, and AutoCES) produce good benchmarks and the tuning is performed inside them, so there is no need to tune them. c. i. This is usually an empirical question. In our experience, an iterative process where you start with the most simple features and transformations (for example, adding Differences and lags) and then increase complexity leads to good results. Transformations such as scaling (MinMax, Standardize) and BoxCox are good options to test. ii. Yes, the three libraries can handle external regressors. (StatsForecast through the AutoARIMA model). To evaluate if external data is relevant, perform cross-validation with and without that data and compare the models’ performances. e. HierarchicalForecast assumes that you have a hierarchical structure (for example, if you want to predict national-level sales and state-level sales and want them to be coherent, that means that if you add up the forecasts at the state level you’ll get the national level). The algorithms in the library are agnostic: you can use any algorithm to produce forecasts and then reconcile them using the library. Here’s an introduction to the topic: https://nixtla.github.io/hierarchicalforecast/examples/introduction.html. Please let us know if you have any further questions.

Nasreddine D

05/10/2023, 2:31 PM

Hi, Thanks again for these details. I will take into account all these comments. I am trying to add exogenous data to AutoARIMA model and using a Crossval to evaluate if the perf is better than just using the target. I checked this tuto, but it is done only by doing a train/test split : https://nixtla.github.io/statsforecast/examples/exogenous.html#train-model But I can't find how to adapt it, there is not "X_df" in the StatsForecast.cross_validation(). Can you propose me something that I can use? Another question regarding this statement: "If the future values of the exogenous regressors are not available, then they must be forecasted or the regressors need to be eliminated from the model. Without them, it is not possible to generate the forecast." My exogenous data is not known in the future, does it mean I have to forecast it before using any model? This could add bias? Thanks again for your help.

Nasreddine D

05/10/2023, 3:45 PM

I am trying to create features from tsfeatures, but I don't understand how the features created can be used within my dataset. When I run it, I get the result below. What should I do with it? Thank you very much.

fede (nixtla) (they/them)

05/10/2023, 8:18 PM

Hey @Nasreddine D! Usually those features are used to explore the characteristics of the time series to plan in advance what models would perform better and cluster them. Here’s a reference on the topic: https://otexts.com/fpp3/features.html.

🙌 1

fede (nixtla) (they/them)

05/10/2023, 8:52 PM

Sorry @Nasreddine D, I missed the previous questions. Here are some ideas: • The

cross_validation

method automatically handles the exgenous variables. So if you have more variables after the target column

, they will be considered exogenous variables and used by the models that allow it. Since, for each window, the exogenous variables of the future are available in cross-validation, their handling is done automatically. • Yes, an approach to use unknown exogenous variables is to forecast them separately and use the forecasted values to produce forecasts of the target variable.

Nasreddine D

05/12/2023, 9:45 AM

Hi @fede (nixtla) (they/them), Thank you again, I just learned new things about features 🙂! Regarding the

cross_validation

: • Rolling window vs Expanding window : ◦ Is there a way to choose one or the other? Or what is the best approach? • This is the configuration I've used to test different models from statsforecast. I would like to do the same with

N-HITS

but I want it to be comparable (same number of windows. I am not sure how it will work because with

N-HITS

there must be a validation set to adjust the model and a test set? If I put

n_windows=100

with N-HITS what will happen to val/test? Hope my question is clear.

Copy code

crossvalidation_df = sf.cross_validation(
    df=Y_ts,
    h=24,
    step_size=1,
    n_windows=100
  )

I am going to start with NeuralForecast or MLForecast: • Which one should I start with? (Remember my TS has around 180 months history). So not very long. • For these 2 librairies should I create Features? Or it's just for ML Forecast? • Feature Engineering: what is the best strategy? I read I could create a bunch of features and then select the best ones (lasso...), do you know any ressource that explain that with code? • Do I need to normalize the data for N-HITS? And other models in NeuralForecast? Thanks again for your valuable time.

fede (nixtla) (they/them)

05/17/2023, 8:40 PM

hey @Nasreddine D! Yes, you can choose rolling window (default behavior) and expanding window (statsforecast and mlforecast) using

refit=False

cross_validation

(currently this option is only available for the Auto models in statsforecast). If you set

n_windows=100

, those windows will be treated as a test set (the model will not use those values during training). Usually, a good workflow starts with ststaforecast, mlforecast, and then neuralforecast. (less complex to more complex models). About features, neuralforecast does not need them, but you’ll need to specify them using mlforecast. The best strategy for feature engineering is to start with lags and simple transformations and then add more to see if the cross-validation signal improves. It is always best practice to scale (or normalize) the data using global models (neuralforecast or mlforecast). The models included in the neuralforecast library can receive the

scaler_type

argument to perform different strategies of scaling, here’s an example: https://nixtla.github.io/neuralforecast/examples/longhorizon_probabilistic.html

Nasreddine D

05/21/2023, 2:06 PM

Hi @fede (nixtla) (they/them), thank you for your feedback. I am following your advice and process. Regarding the feature engineering with cross-val, let say I start with lag 1, check the score, then I add lag 2, check score, then... Until lag 12... And select best features from scores I have seen. Then I start the same process with window features... It will take forever. Is there a way of automating this process? Or this is how it should be done? Thank you.

33 Views

Open in Slack

Previous Next