Hello team, I'm facing an issue with statsforecast...
# general
l
Hello team, I'm facing an issue with statsforecast==1.7.3, I get this error when I try to apply the forecast method
ImportError: cannot import name 'ThreadpoolController' from 'threadpoolctl' (/databricks/python/lib/python3.10/site-packages/threadpoolctl.py)
I already installed
threadpoolctl
but don't know why is asking me for that packages. I'm working on databricks by the way.
j
you need threadpoolctl>=3. We're about to release 1.7.8 which adds this requirement
🙌 1
oh nvm, that issue should only be present on statsforecast 1.7.7, 1.7.3 is veryold and I'm pretty sure doesnt't import that
l
Thanks Jose you always save the day but I move to 1.7.7.1 and now I see this error:
ImportError: cannot import name 'restrict_to_bounds' from 'statsforecast.utils' (/local_disk0/.ephemeral_nfs/envs/pythonEnv-d136fed6-3453-45db-bc08-771692dd1ac8/lib/python3.10/site-packages/statsforecast/utils.py)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-d136fed6-3453-45db-bc08-771692dd1ac8/lib/python3.10/site-packages/statsforecast/ces.py:14
11 *from*
numba
*import* njit
12 *from*
statsmodels.tsa.seasonal
*import* seasonal_decompose
---> 14 *from*
.utils
*import* CACHE, NOGIL, restrict_to_bounds, results
16 # %% ../../nbs/src/ces.ipynb 4
17 # Global variables
18 NONE = 0
j
can you try uninstalling first? You probably have mixed versions now
l
Already donde that but I see the same problem:
ImportError: cannot import name 'ThreadpoolController' from 'threadpoolctl' (/databricks/python/lib/python3.10/site-packages/threadpoolctl.py)
Still with statsforecast==1.7.7.1.
j
yeah that's the problem that requires updating threadpoolctl>=3
1.7.8 is now on PyPI, that'll take care of the threadpoolctl bound
l
Hi again I install the 1.7.8 and now I see this error:
NotImplementedError: no registered dataset conversion for <class 'int'>
My df info:
Copy code
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2559010 entries, 0 to 2559009
Data columns (total 3 columns):
 #   Column     Dtype         
---  ------     -----         
 0   unique_id  object        
 1   ds         datetime64[ns]
 2   y          float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 58.6+ MB
j
Can you include the full error?
l
Looks like this:
Copy code
y_pred = sf.forecast(sales_prepared, horizon)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/statsforecast/core.py:1658, in StatsForecast.forecast(self, h, df, X_df, level, fitted, sort_df, prediction_intervals, id_col, time_col, target_col)
   1656 engine = make_execution_engine(infer_by=[df])
   1657 self._backend = make_backend(engine)
-> 1658 return self._backend.forecast(
   1659     models=self.models,
   1660     fallback_model=self.fallback_model,
   1661     freq=self.freq,
   1662     df=df,
   1663     h=h,
   1664     X_df=X_df,
   1665     level=level,
   1666     fitted=fitted,
   1667     prediction_intervals=prediction_intervals,
   1668     id_col=id_col,
   1669     time_col=time_col,
   1670     target_col=target_col,
   1671 )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/statsforecast/distributed/fugue.py:346, in FugueBackend.forecast(self, df, freq, models, fallback_model, X_df, h, level, fitted, prediction_intervals, id_col, time_col, target_col)
    296 def forecast(
    297     self,
    298     *,
   (...)
    310     target_col: str,
    311 ) -> Any:
    312     """Memory Efficient core.StatsForecast predictions with FugueBackend.
    313 
    314     This method uses Fugue's transform function, in combination with
   (...)
    344     Or the list of available [StatsForecast's models](<https://nixtla.github.io/statsforecast/src/core/models.html>).
    345     """
--> 346     self._fcst_schema = self._get_output_schema(
    347         df=df,
    348         models=models,
    349         level=level,
    350         mode="forecast",
    351         id_col=id_col,
    352         time_col=time_col,
    353         target_col=target_col,
    354     )
    355     self._fitted_schema = self._fcst_schema + fa.get_schema(df).extract(
    356         [target_col]
    357     )
    358     tfm_schema = "a:binary, b:binary" if fitted else self._fcst_schema

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/statsforecast/distributed/fugue.py:264, in FugueBackend._get_output_schema(self, df, models, level, mode, id_col, time_col, target_col)
    253 def _get_output_schema(
    254     self,
    255     *,
   (...)
    262     target_col,
    263 ) -> Schema:
--> 264     keep_schema = fa.get_schema(df).extract([id_col, time_col])
    265     cols: List[Any] = []
    266     if level is None:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:256, in ConditionalDispatcher.__call__(self, *args, **kwds)
    254 if self._is_broadcast:
    255     return list(self.run(*args, **kwds))
--> 256 return self.run_top(*args, **kwds)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:280, in ConditionalDispatcher.run_top(self, *args, **kwargs)
    275 def run_top(self, *args: Any, **kwargs: Any) -> Any:
    276     """Execute the first matching child function
    277 
    278     :return: the return of the child function
    279     """
--> 280     return list(itertools.islice(self.run(*args, **kwargs), 1))[0]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:273, in ConditionalDispatcher.run(self, *args, **kwargs)
    271         has_return = True
    272 if not has_return:
--> 273     yield self._func(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/fugue/dataframe/api.py:51, in get_schema(df)
     31 @fugue_plugin
     32 def get_schema(df: AnyDataFrame) -> Schema:
     33     """The generic function to get the schema of any dataframe
     34 
     35     :param df: the object that can be recognized as a dataframe by Fugue
   (...)
     49         How to get schema of any dataframe using Fugue?
     50     """
---> 51     return as_fugue_df(df).schema

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/fugue/dataframe/dataframe.py:457, in as_fugue_df(df, **kwargs)
    452 def as_fugue_df(df: AnyDataFrame, **kwargs: Any) -> DataFrame:
    453     """Wrap the object as a Fugue DataFrame.
    454 
    455     :param df: the object to wrap
    456     """
--> 457     ds = as_fugue_dataset(df, **kwargs)
    458     if isinstance(ds, DataFrame):
    459         return ds

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:256, in ConditionalDispatcher.__call__(self, *args, **kwds)
    254 if self._is_broadcast:
    255     return list(self.run(*args, **kwds))
--> 256 return self.run_top(*args, **kwds)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:280, in ConditionalDispatcher.run_top(self, *args, **kwargs)
    275 def run_top(self, *args: Any, **kwargs: Any) -> Any:
    276     """Execute the first matching child function
    277 
    278     :return: the return of the child function
    279     """
--> 280     return list(itertools.islice(self.run(*args, **kwargs), 1))[0]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:273, in ConditionalDispatcher.run(self, *args, **kwargs)
    271         has_return = True
    272 if not has_return:
--> 273     yield self._func(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/fugue/dataset/api.py:15, in as_fugue_dataset(data, **kwargs)
     13 if isinstance(data, Dataset) and len(kwargs) == 0:
     14     return data
---> 15 raise NotImplementedError(f"no registered dataset conversion for {type(data)}")

NotImplementedError: no registered dataset conversion for <class 'int'>
j
The horizon is the first argument of forecast, so you're providing
h=df, df=horizon
l
Hahaha thanks that was a rookie mistake
j
I think that's the order of mlforecast, we're working on unifying the interfaces
l
Make sense to me, and yhea I use a lot MLforecast so maybe is that. I was training some models but It takes so much i switched to polars df and now I see this:
Copy code
NotImplementedError: no registered dataset conversion for <class 'polars.dataframe.frame.DataFrame'>
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
File <command-1872998573053024>, line 1
----> 1 y_pred = sf.forecast(h=horizon, df=df, level=[90])

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/statsforecast/core.py:1658, in StatsForecast.forecast(self, h, df, X_df, level, fitted, sort_df, prediction_intervals, id_col, time_col, target_col)
   1656 engine = make_execution_engine(infer_by=[df])
   1657 self._backend = make_backend(engine)
-> 1658 return self._backend.forecast(
   1659     models=self.models,
   1660     fallback_model=self.fallback_model,
   1661     freq=self.freq,
   1662     df=df,
   1663     h=h,
   1664     X_df=X_df,
   1665     level=level,
   1666     fitted=fitted,
   1667     prediction_intervals=prediction_intervals,
   1668     id_col=id_col,
   1669     time_col=time_col,
   1670     target_col=target_col,
   1671 )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/statsforecast/distributed/fugue.py:346, in FugueBackend.forecast(self, df, freq, models, fallback_model, X_df, h, level, fitted, prediction_intervals, id_col, time_col, target_col)
    296 def forecast(
    297     self,
    298     *,
   (...)
    310     target_col: str,
    311 ) -> Any:
    312     """Memory Efficient core.StatsForecast predictions with FugueBackend.
    313 
    314     This method uses Fugue's transform function, in combination with
   (...)
    344     Or the list of available [StatsForecast's models](<https://nixtla.github.io/statsforecast/src/core/models.html>).
    345     """
--> 346     self._fcst_schema = self._get_output_schema(
    347         df=df,
    348         models=models,
    349         level=level,
    350         mode="forecast",
    351         id_col=id_col,
    352         time_col=time_col,
    353         target_col=target_col,
    354     )
    355     self._fitted_schema = self._fcst_schema + fa.get_schema(df).extract(
    356         [target_col]
    357     )
    358     tfm_schema = "a:binary, b:binary" if fitted else self._fcst_schema

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/statsforecast/distributed/fugue.py:264, in FugueBackend._get_output_schema(self, df, models, level, mode, id_col, time_col, target_col)
    253 def _get_output_schema(
    254     self,
    255     *,
   (...)
    262     target_col,
    263 ) -> Schema:
--> 264     keep_schema = fa.get_schema(df).extract([id_col, time_col])
    265     cols: List[Any] = []
    266     if level is None:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:256, in ConditionalDispatcher.__call__(self, *args, **kwds)
    254 if self._is_broadcast:
    255     return list(self.run(*args, **kwds))
--> 256 return self.run_top(*args, **kwds)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:280, in ConditionalDispatcher.run_top(self, *args, **kwargs)
    275 def run_top(self, *args: Any, **kwargs: Any) -> Any:
    276     """Execute the first matching child function
    277 
    278     :return: the return of the child function
    279     """
--> 280     return list(itertools.islice(self.run(*args, **kwargs), 1))[0]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:273, in ConditionalDispatcher.run(self, *args, **kwargs)
    271         has_return = True
    272 if not has_return:
--> 273     yield self._func(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/fugue/dataframe/api.py:51, in get_schema(df)
     31 @fugue_plugin
     32 def get_schema(df: AnyDataFrame) -> Schema:
     33     """The generic function to get the schema of any dataframe
     34 
     35     :param df: the object that can be recognized as a dataframe by Fugue
   (...)
     49         How to get schema of any dataframe using Fugue?
     50     """
---> 51     return as_fugue_df(df).schema

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/fugue/dataframe/dataframe.py:457, in as_fugue_df(df, **kwargs)
    452 def as_fugue_df(df: AnyDataFrame, **kwargs: Any) -> DataFrame:
    453     """Wrap the object as a Fugue DataFrame.
    454 
    455     :param df: the object to wrap
    456     """
--> 457     ds = as_fugue_dataset(df, **kwargs)
    458     if isinstance(ds, DataFrame):
    459         return ds

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:256, in ConditionalDispatcher.__call__(self, *args, **kwds)
    254 if self._is_broadcast:
    255     return list(self.run(*args, **kwds))
--> 256 return self.run_top(*args, **kwds)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:280, in ConditionalDispatcher.run_top(self, *args, **kwargs)
    275 def run_top(self, *args: Any, **kwargs: Any) -> Any:
    276     """Execute the first matching child function
    277 
    278     :return: the return of the child function
    279     """
--> 280     return list(itertools.islice(self.run(*args, **kwargs), 1))[0]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/triad/utils/dispatcher.py:273, in ConditionalDispatcher.run(self, *args, **kwargs)
    271         has_return = True
    272 if not has_return:
--> 273     yield self._func(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0977aebe-09e3-4ef9-b451-ec32c560246a/lib/python3.10/site-packages/fugue/dataset/api.py:15, in as_fugue_dataset(data, **kwargs)
     13 if isinstance(data, Dataset) and len(kwargs) == 0:
     14     return data
---> 15 raise NotImplementedError(f"no registered dataset conversion for {type(data)}")

NotImplementedError: no registered dataset conversion for <class 'polars.dataframe.frame.DataFrame'>
j
Did you install polars in your same session? You may need to restart your kernel