Hi all! I've some experience with sktime but now w...
# general
m
Hi all! I've some experience with sktime but now want to try out your stats/ml/hier-forecasts packages. I have two quick questions: 1. I know statsforecast can use fugue as a backend to work with spark which connects nicely with Databricks which is my cloud env of choice. Do the others (ml and hier) also support fugue/spark backend? 2. What would be the equivalent of the sktime evaluate() fun where you can input a df, a model and a CV strategy and get back some kind of results df? Thanks!
k
Hi @Marc, Thanks for trying out our libraries. Answering your questions: 1. Regarding distributed computation of the rest of the libraries: a. MLForecast supports distributed training, here is a tutorial: https://nixtla.github.io/mlforecast/docs/quick_start_distributed.html. b. NeuralForecast allows for efficient GPU code execution, inherited from Pytorch Lightning. For the moment spark/distributed execution for NeuralForecast is not available. c. HierarchicalForecast by definition uses all the series from the hierarchy simultaneously. For the moment we are not parallelizing its operations. There is work to be done to improve its efficiency. 2. I don't believe that we have an evaluate function of the nature, would you be able to provide more information on it?
m
That was super quick, thanks! 1. Having the rest distributed is good enough, I don't expect reconciliation to be all that taxing. I'll have to check anyway. 2. Sure! https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.model_evaluation.evaluate.html#sktime.forecasting.model_evaluation.evaluate The above is quite handy as it allows me to build my own automl-type workflow without making any decisions for you about the forecasters, metrics, splitting etc. So given that the models are vecotrised, it internally predicts over the folds defined by the CV strategy and returns the results per fold, and you just have to loop over your config of models. Is there a way to do something similar for backtesting within the nixtla ecosystem? Seems like a fundamental functionality in the development of a forecast project.
k
Thanks for the pointer to the functionality, I will check it.
🙏 1
f
Hey @Marc! We have an evaluation function inspired by the
accuracy
function from the fable library. You can pass the outputs of the
cross_validation
method (to perform backtesting), and a list of metrics, and you will get the evaluation for each series and each window. In addition, the function can receive any dataframe (pandas, spark, dask, ray), so if you’re working in a distributed environment, you can evaluate your cross-validation windows quickly. Here’s an example:
Copy code
from datasetsforecast.evaluation import accuracy
from datasetsforecast.losses import mae, mape

evaluation_df = accuracy(Y_cv_df, [mae, mape], agg_by=['unique_id', 'cutoff'])
Feel free to let us know if you have any further questions. :)