Dihong Huang
06/02/2023, 9:59 PMKevin Kho
06/05/2023, 8:00 PMDihong Huang
06/05/2023, 8:21 PMKevin Kho
06/05/2023, 8:22 PMspark
variable on Databricks. It already exists when you load the notebook. If you click the + on TypeError
, can I see the full Traceback?Dihong Huang
06/05/2023, 8:46 PMKevin Kho
06/05/2023, 9:13 PMDihong Huang
06/05/2023, 9:37 PMKevin Kho
06/05/2023, 9:58 PMDihong Huang
06/05/2023, 10:17 PMKevin Kho
06/05/2023, 10:19 PMDihong Huang
06/05/2023, 10:25 PMKevin Kho
06/05/2023, 10:27 PM!pip show fugue
Dihong Huang
06/05/2023, 10:30 PMKevin Kho
06/05/2023, 10:30 PMDihong Huang
06/05/2023, 10:33 PMKevin Kho
06/05/2023, 10:34 PMHan Wang
06/05/2023, 10:50 PM!pip show antlr4-python3-runtime
Dihong Huang
06/05/2023, 10:54 PMHan Wang
06/05/2023, 10:55 PMantlr4-python3-runtime==4.11.1
Kevin Kho
06/05/2023, 11:01 PMHan Wang
06/05/2023, 11:02 PMKevin Kho
06/05/2023, 11:03 PMsynapse
libraryHan Wang
06/05/2023, 11:03 PMKevin Kho
06/05/2023, 11:03 PMDihong Huang
06/05/2023, 11:04 PMKevin Kho
06/05/2023, 11:08 PMantlr4-python3-runtime==4.11.1
and I think that should workDihong Huang
06/05/2023, 11:23 PMKevin Kho
06/05/2023, 11:38 PMHan Wang
06/05/2023, 11:41 PMDihong Huang
06/05/2023, 11:42 PMHan Wang
06/05/2023, 11:47 PMDihong Huang
06/06/2023, 1:00 AMHan Wang
06/06/2023, 1:01 AMKevin Kho
06/06/2023, 1:02 AMHan Wang
06/06/2023, 1:02 AMKevin Kho
06/06/2023, 1:02 AMHan Wang
06/06/2023, 1:02 AMDihong Huang
06/06/2023, 1:05 AMHan Wang
06/06/2023, 1:05 AMKevin Kho
06/06/2023, 1:06 AMfrom statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
sf = StatsForecast(
models = [AutoARIMA(season_length = 12)],
freq = 'M'
)
sf.fit(df). # insert Spark DataFrame here
Dihong Huang
06/06/2023, 1:07 AMKevin Kho
06/06/2023, 1:38 AMDihong Huang
06/06/2023, 1:50 AMKevin Kho
06/06/2023, 1:51 AMfrom statsforecast.distributed.utils import forecast
from statsforecast.distributed.fugue import FugueBackend
from statsforecast.models import AutoARIMA
from statsforecast.core import StatsForecast
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
backend = FugueBackend(spark, {"fugue.spark.use_pandas_udf":True})
series = generate_series(n_series=3, seed=1).reset_index()
sdf = spark.createDataFrame(series)
res = forecast(sdf,
[AutoARIMA()],
freq="D",
h=7,
parallel=backend).toPandas()
res.head()
forecast
is preferred over the fit-predict
because the model can be big as it contains all of the weights and the whole timeseries on the input. forecast
will do the step in one go and be more memory efficient.
Operationally, the fit-predict
makes sense when you want to store the model and then retrieve it to predict new points later. For timeseries, it’s more common to just run the forecast every so often (every day or week)Dihong Huang
06/06/2023, 5:10 AMKevin Kho
06/06/2023, 5:27 AM"spark.task.cpus": "1"
Dihong Huang
06/06/2023, 1:28 PMKevin Kho
06/06/2023, 3:02 PMDihong Huang
06/06/2023, 3:35 PMKevin Kho
06/06/2023, 3:37 PM:
"spark.speculation": "true",
"spark.sql.adaptive.enabled": "false",
"spark.task.cpus": "1"
Dihong Huang
06/06/2023, 3:58 PMKevin Kho
06/06/2023, 3:59 PM!pip show triad
and you want it 0.8.9Dihong Huang
06/06/2023, 4:09 PMKevin Kho
06/06/2023, 4:35 PMDihong Huang
06/06/2023, 4:38 PMHan Wang
06/06/2023, 4:39 PMKevin Kho
06/06/2023, 7:21 PMDihong Huang
06/06/2023, 8:24 PM