I am trying to replicate what I've done using the ...
# mlforecast
I am trying to replicate what I've done using the
function using it with cross_validation. No issues with running it normally with a pandas dataframe, but when I try to run it with a spark dataframe (much like I was able to do wtih statsforecast), I get the following error: "RecursionError: maximum recursion depth exceeded in comparison". Haven't been able to find anything super helpful via google/stackoverflow, ChatGPT, nor this slack channel. I've tried all of the algos in the code below and also limiting it to just one (several iterations). Here's the code: @njit def rolling_mean_12(x): return rolling_mean(x, window_size=12) @njit def rolling_mean_24(x): return rolling_mean(x, window_size=24) ****** ML_models = [ lgb.LGBMRegressor(n_jobs=4, random_state=0, verbosity=-1), # xgb.XGBRegressor(n_jobs=4, random_state=0), # MLPRegressor(random_state=0, max_iter=1000, early_stopping=True, n_iter_no_change=10, tol=1e-4), # RandomForestRegressor(random_state=0), # ExtraTreesRegressor(random_state=0), # HistGradientBoostingRegressor(random_state=0), # KNeighborsRegressor() ] mlf = MLForecast( models = ML_models, freq = 'M',# our series have integer timestamps, so we'll just add 1 in every timeste, lags=[1, 12], # lags=range(1,6, 1), lag_transforms={ 1: [expanding_mean], 12: [rolling_mean_12], # 24: [rolling_mean_24], }, # date_features=['year','month','quarter','days_in_month'], target_transforms=[Differences([1, 24])] ) #cross validation of statsforecast models using spark ML_crossvalidation_SDF = mlf.cross_validation( data=sdf, window_size=3, n_windows=5 ).toPandas() ****** Note that the below does work with regular pandas dataframe: ML_crossvalidation_df = mlf.cross_validation( data=df2, window_size=3, n_windows=5, ) Any help appreicated!
Hey. For mlforecast we have a separate interface, mainly because the logic is very different. Here are the docs to perform distributed training with spark: https://nixtla.github.io/mlforecast/distributed.forecast.html#spark, you have to import the distributed mlforecast class (
from mlforecast.distributed import DistributedMLForecast
). Please let us know if you run into any issues
Ah, thanks. I'll give it a try. Appreciate all of the prompt responses!