This message was deleted Nixtla Community #statsforecast

Join Slack

This message was deleted.

# statsforecast

Slackbot

09/13/2023, 8:51 PM

This message was deleted.

Kevin Kho

09/13/2023, 10:00 PM

This you try just passing a Spark DataFrame instead? I forget if that works already

Kevin Kho

09/13/2023, 10:01 PM

Otherwise, you can make a

backend

object like this and the backend has the cross validate method

Brian Head

09/13/2023, 11:07 PM

@Kevin Kho thanks for your response. I tried passing the spark dataframe to

cross_validation

but that just creates the a spark datframe itself. But doesn't run/train the models. See example here: ** sf = StatsForecast( models=SF_models, freq='M', # n_jobs = -1, fallback_model = Naive() ) SF_crossvalidation_df = sf.cross_validation(df = sdf, h = 3, step_size = 1, n_windows = 5)SF_crossvalidation_df = sf.cross_validation(df = sdf, h = 3, step_size = 1, n_windows = 5) ** Also tried the

parallel=backend

in the link you provided in both the

StatsForecast

and

cross_validation

. When I put it in the former I get an error "TypeError: __init__() got an unexpected keyword argument 'parallel'". When I put it in the latter I get an error "TypeError: cross_validation() got an unexpected keyword argument 'parallel'".

Kevin Kho

09/13/2023, 11:15 PM

Ah ok. I meant try:

Copy code

backend = FugueBackend(spark, {"fugue.spark.use_pandas_udf":True})
backend.cross_validation(df = sdf,
                          h = 3,
                          step_size = 1,
                          n_windows = 5)

Kevin Kho

09/13/2023, 11:16 PM

Oh my bad, my instructions were very bad

Kevin Kho

09/13/2023, 11:19 PM

It would use this part of the code

Kevin Kho

09/13/2023, 11:23 PM

And then it will return a SparkDataFrame so you might need to do something to trigger the action

José Morales

09/14/2023, 1:51 AM

The cross_validation method returns a spark dataframe. As Kevin said, in order to trigger the action you have to do something with it, for example:

Copy code

cv_results = sf.cross_validation(df=spark_df, h=10)
cv_results.write.parquet('cv_results')

If you're using a remote cluster make sure to save it in a shared storage like s3

Brian Head

09/14/2023, 5:58 PM

Thank you @Kevin Kho and @José Morales this resolved my issues with statsforecast cv. Appreciate the help! I am having an issue with doing CV in MLforecast now. I can post my question on that in the mlforecast channel.

Kevin Kho

09/14/2023, 6:05 PM

Nice!

Kevin Kho

09/14/2023, 6:06 PM

Oh Jose is right, you were probably right already, just needed to save into parquet or show or something to trigger the action

Open in Slack

Previous Next