This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

06/02/2023, 9:59 PM

This message was deleted.

Dihong Huang

06/05/2023, 4:45 PM

I also encountered the same error when I was this command:

Dihong Huang

06/05/2023, 4:45 PM

from statsforecast.models import ( ADIDA, CrostonClassic, IMAPA, TSB )

Kevin Kho

06/05/2023, 8:00 PM

Not sure but it sounds like you have installation issues? Might be worth trying a fresh environment

Dihong Huang

06/05/2023, 8:21 PM

How should I approach this? I am very new to databricks.

Kevin Kho

06/05/2023, 8:22 PM

Is it importing that literally errors?

Kevin Kho

06/05/2023, 8:23 PM

How did you install the library? Did you go to the cluster and then to the libraries?

Kevin Kho

06/05/2023, 8:24 PM

Btw, you don’t need to create

spark

variable on Databricks. It already exists when you load the notebook. If you click the + on

TypeError

, can I see the full Traceback?

Dihong Huang

06/05/2023, 8:46 PM

I directly used pip install in the notebook. Also, both errors were on the import command.

Dihong Huang

06/05/2023, 8:48 PM

message has been deleted

Kevin Kho

06/05/2023, 9:13 PM

Don’t pip install in notebook. It only installs on the driver and not the workers. Go to the cluster settings and install in the Libraries tab

Kevin Kho

06/05/2023, 9:15 PM

Try installing it with this UI instead and maybe things will work better

Dihong Huang

06/05/2023, 9:37 PM

Cool, what is the source and type for them?

Kevin Kho

06/05/2023, 9:58 PM

PyPI and then just need to type in the library name

Dihong Huang

06/05/2023, 10:17 PM

Thank you! I successfully installed the libraries but the same error still occurred.

Dihong Huang

06/05/2023, 10:17 PM

Here is the complete error message, which is very long:

Dihong Huang

06/05/2023, 10:18 PM

OK the message is too long to copy into here and screenshot too

Kevin Kho

06/05/2023, 10:19 PM

email me a log at kdykho@gmail.com and i can look

Dihong Huang

06/05/2023, 10:25 PM

Sent!

Kevin Kho

06/05/2023, 10:27 PM

Oh shit lol it’s a Fugue issue. Can you go to a notebook and do:

Copy code

!pip show fugue

Kevin Kho

06/05/2023, 10:27 PM

To get me the Fugue version?

Kevin Kho

06/05/2023, 10:29 PM

I will try to replicate

Kevin Kho

06/05/2023, 10:30 PM

Ok let me spin up a cluster and try this now

Kevin Kho

06/05/2023, 10:33 PM

Can you also give me Databricks Runtime version? It’s attached to the cluster

Dihong Huang

06/05/2023, 10:33 PM

Its 13.1

Kevin Kho

06/05/2023, 10:34 PM

I suspect the latest 12 will work but I’m verifying. I think 13 had breaking changes

Kevin Kho

06/05/2023, 10:36 PM

I can confirm 12 works for me, trying on 13

Kevin Kho

06/05/2023, 10:46 PM

Works for 13.0, trying to reproduce on 13.1

Kevin Kho

06/05/2023, 10:48 PM

Pinging @Han Wang

Kevin Kho

06/05/2023, 10:49 PM

He attached this log with this:

error.rtf

Han Wang

06/05/2023, 10:50 PM

can you do this @Dihong Huang

Han Wang

06/05/2023, 10:50 PM

Copy code

!pip show antlr4-python3-runtime

Han Wang

06/05/2023, 10:53 PM

@Kevin Kho can you reproduce the error on 13.1?

Han Wang

06/05/2023, 10:55 PM

ah this is very weird

Han Wang

06/05/2023, 10:55 PM

in your import can you force

Han Wang

06/05/2023, 10:55 PM

Copy code

antlr4-python3-runtime==4.11.1

Han Wang

06/05/2023, 10:56 PM

the version you installed is incorrect, it is for py 3.7 and it is also too old

Han Wang

06/05/2023, 10:57 PM

i don't know why you were able to do that, fugue 0.8.4 should bring it to 4.11.1 automatically

Han Wang

06/05/2023, 10:57 PM

maybe you didn't install the packages in the correct way

Han Wang

06/05/2023, 10:59 PM

ah i see this is because stix2-patterns requires a very old version of antlr https://github.com/oasis-open/cti-pattern-validator/blob/master/setup.py#L40

Han Wang

06/05/2023, 10:59 PM

@Dihong Huang is stix2-patterns important to you?

Kevin Kho

06/05/2023, 11:01 PM

Just installing statsforecast worked for me for DBR 12 - 13.1 👍

Han Wang

06/05/2023, 11:02 PM

yeah it failed on @Dihong Huang side because he has a very special dependency stix2-patterns requiring a version old version of a package that conflicts with fugue

Kevin Kho

06/05/2023, 11:03 PM

I think that is a dependency of his

synapse

library

Han Wang

06/05/2023, 11:03 PM

since it doesn't complain on DB side, you can simple enforce the version of antlr4-python3-runtime

Kevin Kho

06/05/2023, 11:03 PM

https://github.com/vertexproject/synapse/blob/master/setup.py#L98

Dihong Huang

06/05/2023, 11:04 PM

@Han Wang I think this stix2 thing doesn’t really matter

Dihong Huang

06/05/2023, 11:07 PM

Thank you guys so much! How should I enforce the version of antlr4-python3-runtime?

Kevin Kho

06/05/2023, 11:08 PM

When you install the PyPI libraries, add another one:

Copy code

antlr4-python3-runtime==4.11.1

and I think that should work

Dihong Huang

06/05/2023, 11:23 PM

I installed that but the error still come with importing, what else could be wrong?

Kevin Kho

06/05/2023, 11:38 PM

You may need a restart of the cluster

Han Wang

06/05/2023, 11:41 PM

when you restart the cluster

Han Wang

06/05/2023, 11:41 PM

you should check the version of the pkg again

Han Wang

06/05/2023, 11:41 PM

to see if it is the new one or old one

Dihong Huang

06/05/2023, 11:42 PM

Oh I forgot to mention that my databricks is the community edition so I might need to create a new cluster and install them again.

Dihong Huang

06/05/2023, 11:42 PM

Anyway thank you so much! I will try it later and let you know if there are any other questions!

Han Wang

06/05/2023, 11:47 PM

sure

Han Wang

06/05/2023, 11:48 PM

we will see if we can make antler a soft dependency that you won't hit it unless you need it. but this will take a few weeks

Dihong Huang

06/06/2023, 1:00 AM

The import seems working, but new issue comes up. My dataframe is a pyspark.pandas.frame.DataFrame converted from spark dataframe by df.pandas_api(), and there is a error when calling StatsForecast(): ValueError: is not allowed

Dihong Huang

06/06/2023, 1:01 AM

Do I have to use a pure pandas dataframe?

Han Wang

06/06/2023, 1:01 AM

you have to use a native spark dataframe

Kevin Kho

06/06/2023, 1:02 AM

That is expected. PySpark Pandas is a different class and will not be compatible with libraries that are compatible with Pandas or even with Spark. Statsforecast and either can a Spark DataFrame or Pandas DataFrame through Fugue

Han Wang

06/06/2023, 1:02 AM

we have not supported pandas on spark dataframe

Han Wang

06/06/2023, 1:02 AM

the conversion is trivial

Kevin Kho

06/06/2023, 1:02 AM

If you’re on Databricks, you should use a Spark DataFrame

Han Wang

06/06/2023, 1:02 AM

yes

Han Wang

06/06/2023, 1:03 AM

oh and please also make sure cloudpickle is installed, the new spark no long installs cloudpickle automatically, but at least for now we need it

Dihong Huang

06/06/2023, 1:05 AM

OK thanks

Dihong Huang

06/06/2023, 1:05 AM

Do I use FugueBackend as illustrated in this example?

Dihong Huang

06/06/2023, 1:05 AM

https://nixtla.github.io/statsforecast/examples/prophet_spark_m5.html

Han Wang

06/06/2023, 1:05 AM

Kevin Kho

06/06/2023, 1:06 AM

That will work but the latest versions just do it for you under the hood. You just need to pass a Spark DataFrame to statsforecast when you do:

Copy code

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(
    models = [AutoARIMA(season_length = 12)],
    freq = 'M'
)

sf.fit(df). # insert Spark DataFrame here

Dihong Huang

06/06/2023, 1:07 AM

Great thanks

Dihong Huang

06/06/2023, 1:23 AM

Sorry to bother again, but spark dataframe doesn’t work here either.

Dihong Huang

06/06/2023, 1:24 AM

In the very end of the error message there is a if not statement specifying that the dataframe must be a pd.DataFrame

Kevin Kho

06/06/2023, 1:38 AM

Let me test one second

Dihong Huang

06/06/2023, 1:50 AM

By converting the dataframe to pandas dataframe there is a memory issue.

Kevin Kho

06/06/2023, 1:51 AM

I guess the new way hasn’t been released yet. You might need to use FugueBackend for now but it will become more invisible in the future

Kevin Kho

06/06/2023, 1:54 AM

Here is a working example

Copy code

from statsforecast.distributed.utils import forecast
from statsforecast.distributed.fugue import FugueBackend
from statsforecast.models import AutoARIMA
from statsforecast.core import StatsForecast

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
backend = FugueBackend(spark, {"fugue.spark.use_pandas_udf":True})

series = generate_series(n_series=3, seed=1).reset_index()
sdf = spark.createDataFrame(series)

res = forecast(sdf, 
         [AutoARIMA()], 
         freq="D", 
         h=7, 
         parallel=backend).toPandas()
res.head()

Kevin Kho

06/06/2023, 1:55 AM

For timeseries, the

forecast

is preferred over the

fit-predict

because the model can be big as it contains all of the weights and the whole timeseries on the input.

forecast

will do the step in one go and be more memory efficient. Operationally, the

fit-predict

makes sense when you want to store the model and then retrieve it to predict new points later. For timeseries, it’s more common to just run the forecast every so often (every day or week)

Kevin Kho

06/06/2023, 1:56 AM

In the future, you shouldn’t need to specify the FugueBackend, it will just be inferred for Spark, Dask, and Ray DataFrames

Kevin Kho

06/06/2023, 1:58 AM

Actually no, I think you shouldn’t need FugueBackend. I’ll look into it

Dihong Huang

06/06/2023, 5:10 AM

This seems to run but I don’t know what’s going on that it run for 3 hours and stuck at the same point

Kevin Kho

06/06/2023, 5:27 AM

You can try on a sample of the data like 10% of timeseries. It may be your compute resources are struggling? You can check utilization right? Also, I don’t have a good feel for how long the models take to run. I’m on the Fugue project and we just collaborate with Nixtla so I am more on the Spark side. It might worth asking a new question in this Slack, but I think you can benchmark the times with the smaller dataset and you’ll get a feel for how expensive each model is.

Kevin Kho

06/06/2023, 5:28 AM

It might be bottlenecking. Can you add this config to Spark on your cluster

Copy code

"spark.task.cpus": "1"

Dihong Huang

06/06/2023, 1:28 PM

My local environment takes only less than 10 mins to finish the same thing, very weird.

Dihong Huang

06/06/2023, 3:01 PM

I just tried to run it again an found this. My new cluster has exactly the same runtime and libraries as the previous one

Kevin Kho

06/06/2023, 3:02 PM

Will respond more in like 15 mins

Kevin Kho

06/06/2023, 3:32 PM

Hey so on the Spark hanging, were you able to set the config I mentioned a bit above and try that? On this one, we made some release last night and it seems to have broken things. Looking into it

Kevin Kho

06/06/2023, 3:32 PM

Can you restart the cluster? We removed the package with an error, if you restart, I think it will install good versions

Dihong Huang

06/06/2023, 3:35 PM

Do I set the config like this?

Dihong Huang

06/06/2023, 3:35 PM

I am restarting it right now.

Kevin Kho

06/06/2023, 3:37 PM

That is right, but I believe maybe no

Kevin Kho

06/06/2023, 3:41 PM

You can try these also:

Copy code

"spark.speculation": "true",
"spark.sql.adaptive.enabled": "false",
"spark.task.cpus": "1"

Dihong Huang

06/06/2023, 3:58 PM

I am trying this now, but the restart before didn’t work

Kevin Kho

06/06/2023, 3:59 PM

Maybe uninstall and reinstall statsforecast?

Kevin Kho

06/06/2023, 4:01 PM

The restart of the cluster did not reinstall the libraries?

Kevin Kho

06/06/2023, 4:02 PM

You can do

!pip show triad

and you want it 0.8.9

Dihong Huang

06/06/2023, 4:09 PM

I can see they being reinstalled

Kevin Kho

06/06/2023, 4:35 PM

If qpd is 0.4.2 and triad is 0.9.0 it should also work. Did youg et it working?

Dihong Huang

06/06/2023, 4:38 PM

It’s now like this, should I stop it?

Han Wang

06/06/2023, 4:39 PM

@Dihong Huang if you have time we can have a quick meeting

Han Wang

06/06/2023, 4:46 PM

i think that is just a warning

Kevin Kho

06/06/2023, 7:21 PM

Hi @Dihong Huang, the new versions of the Fugue dependencies have been updated so installing statsforecast and Fugue should not give any issue. Thanks for reporting!

Dihong Huang

06/06/2023, 8:24 PM

Thank you!

10 Views

Open in Slack

Previous Next