https://github.com/nixtla logo
#general
Title
# general
d

Darius Marginean

12/16/2022, 8:29 AM
Hello all! I ve got 2 questions for you guys: 1. Why in mlforecast, if you want to use the
crossvalidation
method from the
MLForecast
class it is done using Time Series cross-validation & in statsforecast, the
crossvalidation
method from StatsForecast class it is done using Expanding Window? 2. I've walked over your tutorial for
crossvalidation
in
statsforecast
(https://nixtla.github.io/statsforecast/examples/crossvalidation.html) and when you want to evaluate the results using RMSE from
datasetsforecast.losses
, I've seen that it is computed on the whole
crossvalidation_df
not on each fold separately & then aggregate the results ( unlike in mlforecast: https://nixtla.github.io/mlforecast/docs/end_to_end_walkthrough.html, where you've put an example on how to compute the losses on each fold separately using
evaluate_cf(df):
function
👀 1
Here's what I meant for Time Series cross-validation:
And here is what I've meant for Expanding window cross_validation (in statsforecast you can set the
step_size
, which is missing from mlforecast) Is there a method to do Expanding window cross_validation in mlforecast too?
@Maria @Andrei Tulbure
As far as I reckon modifying the
offset
variable from the
backtest_splits()
function did what I wanted (Expanding window with
step_size=1
) Previous:
Copy code
def backtest_splits(
    data,
    n_windows: int,
    window_size: int,
    freq: Union[pd.offsets.BaseOffset, int],
    time_col: str = "ds",
):
    for i in range(n_windows):
        offset = (n_windows - i) * window_size
        ...
Now:
Copy code
def backtest_splits(
    data,
    n_windows: int,
    window_size: int,
    freq: Union[pd.offsets.BaseOffset, int],
    time_col: str = "ds",
    expending_window = int,
):
    if expending_window == 1:

        for i in range(n_windows):
            offset = window_size+n_windows-1-i
            ... (the rest of the code is left unchanged)
    else:
        Previous implementation
Also I added expending_window as an argument in the cross_validation method from the MLForecast class like so:
Copy code
def cross_validation(
        self,
        data: pd.DataFrame,
        n_windows: int,
        window_size: int,
        id_col: str,
        time_col: str,
        target_col: str,
        expending_window: int = 0,
        static_features: Optional[List[str]] = None,
        dropna: bool = True,
        keep_last_n: Optional[int] = None,
        dynamic_dfs: Optional[List[pd.DataFrame]] = None,
        predict_fn: Optional[Callable] = None,
        **predict_fn_kwargs,
    ):
If it's not specified it will use the previous implementation, if it's 1 it will use Expanding window with size of 1. Most probably not the best aproach but I needed it fast
Should i make a PR with the changes?
f

fede (nixtla) (they/them)

12/16/2022, 5:29 PM
hey @Darius Marginean! Regarding the cross-validation methods, you’re right. Currently, you can’t perform expanding window cross-validation using mlforecast. The underlying
step_size
in mlforecast is
window_size
. We are working on matching the methods, arguments, and functionality between mlforecast and statsforecast, we will prioritize the cross validation method 🙂
🤩 2
On the second point, although both evaluation techniques are valid, in practice it is preferred to calculate the loss on each fold separately and then average it (as in mlforecast). We will update the statsforecast documentation to reflect this. 🙌
❤️ 3
m

Max (Nixtla)

12/16/2022, 5:53 PM
cc @Mariana Menchero