Nixtla Community #mlforecast

jan rathfelder

06/19/2024, 1:17 PM

Hi, i would like to build a custom objective function for xgboost. but all stand solutions fail, because i cant really alter the fit method like i could do in xgboost (where i can just specify a custom objective function). i tried multiple ways but all failed so far. do you guys have any idea on how to do this? it seems one issue is that the train data that i can access is not in the Dmatrix style. i am happy for any suggestions here

Olgahan Cat

06/19/2024, 3:55 PM

hi guys! first, I would like to thank for this awesome package. I have a question: when I use a model in MLForecast with multiple time series, does the model fit for each series separately, or uses all series at the same time to estimate parameters?

Biagio Principe

06/25/2024, 4:00 PM

hi everyone, thank you for the work without which my phd would be a lot harder. I was wondering if it possible to automl and use one model per step approach as described here: https://nixtlaverse.nixtla.io/mlforecast/docs/how-to-guides/one_model_per_horizon.html Grazie mille!

Olgahan Cat

06/25/2024, 6:37 PM

hi guys, i am fitting some ML models (linear regression and xgboost) to predict my multiple time series. i wonder if there is documentation or some resources for feature engineering and hyperparamter tuning, especially with respect to lag variables?

Sarim Zafar

06/26/2024, 8:48 AM

Is there anyway to combine hyperparameter tuning with spark lgbm?

Vítor Barbosa

06/26/2024, 10:38 PM

Hi guys, when trying to use

fill_gaps

here:

Copy code

from utilsforecast.preprocessing import fill_gaps
stocks_basic_pd = fill_gaps(stocks_basic_pd, freq='B', start='per_serie', end='per_serie', id_col='Ticker', time_col='Date')

I am getting the error below. Any ideas?

Copy code

{
	"name": "ValueError",
	"message": "cannot handle a non-unique multi-index!",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[54], line 2
      1 from utilsforecast.preprocessing import fill_gaps
----> 2 stocks_basic_pd = fill_gaps(stocks_basic_pd, freq='B', start='per_serie', end='per_serie', id_col='Ticker', time_col='Date')

File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\utilsforecast\\preprocessing.py:166, in fill_gaps(df, freq, start, end, id_col, time_col)
    164         times += offset.base
    165 idx = pd.MultiIndex.from_arrays([uids, times], names=[id_col, time_col])
--> 166 res = df.set_index([id_col, time_col]).reindex(idx).reset_index()
    167 extra_cols = df.columns.drop([id_col, time_col]).tolist()
    168 if extra_cols:

File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\frame.py:5365, in DataFrame.reindex(self, labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
   5346 @doc(
   5347     NDFrame.reindex,
   5348     klass=_shared_doc_kwargs[\"klass\"],
   (...)
   5363     tolerance=None,
   5364 ) -> DataFrame:
-> 5365     return super().reindex(
   5366         labels=labels,
   5367         index=index,
   5368         columns=columns,
   5369         axis=axis,
   5370         method=method,
   5371         copy=copy,
   5372         level=level,
   5373         fill_value=fill_value,
   5374         limit=limit,
   5375         tolerance=tolerance,
   5376     )

File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\generic.py:5607, in NDFrame.reindex(self, labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
   5604     return self._reindex_multi(axes, copy, fill_value)
   5606 # perform the reindex on the axes
-> 5607 return self._reindex_axes(
   5608     axes, level, limit, tolerance, method, fill_value, copy
   5609 ).__finalize__(self, method=\"reindex\")

File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\generic.py:5630, in NDFrame._reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   5627     continue
   5629 ax = self._get_axis(a)
-> 5630 new_index, indexer = ax.reindex(
   5631     labels, level=level, limit=limit, tolerance=tolerance, method=method
   5632 )
   5634 axis = self._get_axis_number(a)
   5635 obj = obj._reindex_with_indexers(
   5636     {axis: [new_index, indexer]},
   5637     fill_value=fill_value,
   5638     copy=copy,
   5639     allow_dups=False,
   5640 )

File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\indexes\\base.py:4426, in Index.reindex(self, target, method, level, limit, tolerance)
   4422     indexer = self.get_indexer(
   4423         target, method=method, limit=limit, tolerance=tolerance
   4424     )
   4425 elif self._is_multi:
-> 4426     raise ValueError(\"cannot handle a non-unique multi-index!\")
   4427 elif not self.is_unique:
   4428     # GH#42568
   4429     raise ValueError(\"cannot reindex on an axis with duplicate labels\")

ValueError: cannot handle a non-unique multi-index!"
}

Johannes Emme

06/29/2024, 2:46 PM

Hi, I have a question regarding the attainment of conformal prediction intervals. It's a bit tricky to explain in writing, but I'll do my best to convey the message. In short, the issue I'm experiencing is that the current creation of prediction intervals is horizon-based and does not account for "where during the hour/day/week" the error occurs. What do I mean by that? Let me show you what I did to encounter this issue. First, I should mention that I am working with hourly time series data. I trained a model using the conformal_distribution with 10 windows of a conformal interval size (h) of 24*4 (96). In the plot 1, you can see the resulting

cs_df

and the true target plotted against each other. From this plot, it can be seen that my model is okay at predicting the weekends but has clear difficulties in predicting the Mondays. However, when I used the model for predictions (see plot 2), the uncertainty for the weekends was very large, and the Mondays had small uncertainty. (In the plot2 I have forgotten legends: black = true, blue = mean prediction, purple = 10th and 90th percentiles) What I have come to realize is that the problem arises from a misalignment between the conformal horizon and the horizon of when I am predicting. With a conformal horizon of 96, the errors collected for a specific timestep are not “belonging to the same timeslot.” For instance, the first error in the first window corresponds to Monday 00:00, while for the next window, the first hour is Friday 00:00, then Tuesday 00:00, and so on. Hence, when I predict the consumption during Saturday, the quantiles are based on several different days and hours and not “Saturday hour errors.” To overcome this issue, I set the conformal horizon to 24*7 (168) so that my conformal windows start with the same day as when I am predicting. Then I get the following result (see plot 3 and 4), where the uncertainty is low for the weekends and high for the Mondays. However, I do not believe this is a sustainable solution. Unfortunately, I don't have a very great alternative either. Currently, I have simply for my case rewritten the

_add_conformal_distribution_intervals

function by: 1. Requiring that n_windows*h >= 168 to have all hours in the week represented. 2. Joining the

cs_df

and

fcst_df

day_of_week

and

hour

. 3. Subtracting and adding the mean to get a distribution around each hour, and then calculating the quantiles I am very curious to hear your thoughts on this. Best regards, Johannes

jan rathfelder

07/01/2024, 10:10 AM

anybody was able to use reg:tweedie with xgboost? for me it always throws and error. i am applying optuna + xgboost and i assume some forecasts became nan or so, which breaks creating lag_transform:

UserWarning: Found null values in expanding_std_lag1, rolling_std_lag1_window_size7_min_samples1, rolling_std_lag1_window_size70_min_samples1, rolling_std_lag1_window_size105_min_samples1, seasonal_rolling_std_lag1_season_length7_window_size3_min_samples1.

warnings.warn(f'Found null values in {", ".join(cols_with_nulls)}.')

Krystian W.

07/02/2024, 10:13 PM

Is it somehow possible to use

max_horizon

arg in DistributedMlForecast? Or only through some workaround?

Biagio Principe

07/03/2024, 7:52 PM

Hi good evening, I was wondering (thank you for the support), if it possible to use this scaler in MLForecast.

Copy code

scaler = TemporalNorm(scaler_type='standard', dim=1)

Dinis Timoteo

07/04/2024, 2:51 PM

Hi I'm using optuna to gridsearch on a LGBM instance. Since the workaround is taking this process too much time, I decided to try the mlforecast_objective option. Now I'm intrigued... clearly in the mlforecast_objective instance we can see it fits and refits (if we decide it). But how can this study optimization run of 200 trials be faster then it takes to fit the model afterwards using the recomended params? In another part, I'm using TPESampler... idenpendent of the params i apply to it. Even if I change certain MLForecast params (such has lags, or lag_transform) the output 'best' params stay the same all the way? I mean each float params goes till 1e-17 "Something wrong is not right!" 🤣🤣🤣

Krystian W.

07/07/2024, 7:32 PM

Hi, I've a problem with running cross validation on Spark. I have several additional features which all are dynamic.

Copy code

cv = fcst.cross_validation(
    spark_train_df,
    n_windows=n_windows,
    h=h,
    static_features=[],
)

I tried to run cv.show() but I keep getting a KeyError that these features aren't found in index. On local works just fine.

Ml Club

07/09/2024, 8:43 AM

888 elif pandas_requires_conversion and any(d == object for d in dtypes_orig):

889     # Force object if any of the dtypes is an object

890     dtype_orig = object

ValueError: at least one array or dtype is required

Untitled

Ml Club

07/09/2024, 8:46 AM

#mlforecast When i run this code, i get the following error, if i use the option

lag=[1]

then it works great. what is the issue please help me resolve.

Also i want to do a target transformation of np.log, How can i do that ?

Ml Club

07/11/2024, 7:56 AM

Can i get some help on this query plz. It would be very helpful. Thanks for your help in advance

Ml Club

07/12/2024, 6:17 AM

#mlforecast @José Morales Hi Jose, I have posted a query, it will be very kind of you, if you can address this issue posted above. Thanks

Krystian W.

07/14/2024, 2:06 PM

FYI - current naming pattern in lag_transformation breaks when using Spark - [UNRESOLVED_COLUMN.WITH_SUGGESTION] . Spark can't recognize column name like

rolling_quantile_lag_1_p=0.5_window_size_7

because of the dot in the parameter.

Biagio Principe

07/15/2024, 8:35 AM

Good morning, Thank you for the support! This is a great community 😊 I would love to integrate Diebold-Mariano and Giacomini-White tests (https://epftoolbox.readthedocs.io/en/latest/modules/statest/gw.html) into Nixtla to evaluate model performance. Additionally, I'd find hourly comparison plots like the attached image very useful. I have a quick question: Does using

max_horizon

with

lag 1

introduce data leakage? (see second image) Grazie mille!

Ml Club

07/16/2024, 4:19 PM

@José Morales Hello, I want to do the a power, Logarithmic, simple moving averge model using Nixtla, Can you Please help me how can do this for multiple ID. my data is having ID, Timestamp and Values. I have done the power model the following way...!

Ml Club

07/16/2024, 4:24 PM

Copy code

model = LinearRegression()
model.fit(np.log(np.array(range(1,len(df)+1)).reshape(-1, 1)), np.log(df['Values'].values+1))

timestamps = pd.date_range(datetime.strptime(t['Timestamp'].values[-1],'%m-%d-%Y'), periods=forecast_horizon+1, freq='MS')
timestamps = timestamps[1:]

temp = pd.DataFrame()
temp['Timestamp'] = timestamps
forecasts = model.predict(np.log(np.array(range(len(df), len(df) + forecast_horizon)).reshape(-1, 1)))
forecast_values = np.exp(forecasts)-1
df['Power'] = np.exp(model.predict(np.log(np.array(range(1,len(df)+1)).reshape(-1, 1))))-1

Guillaume GALIE

07/17/2024, 6:29 AM

Hello Could you please explain a bit what is num_seas_diffs and num_diffs from coreforecast? Is it a unit root test to know how many differences are necessary to make a given time serie stationary? https://nixtlaverse.nixtla.io/coreforecast/differences Thank you in advance

Ml Club

07/18/2024, 6:28 AM

@José Morales How can i do this using nixtla ?

Ml Club

07/18/2024, 6:28 AM

Copy code

from sklearn.preprocessing import PolynomialFeatures

best_poly_features = PolynomialFeatures(degree=3)
X_poly = best_poly_features.fit_transform(np.array(range(len(df))).reshape(-1, 1))

best_poly_model = LinearRegression()
best_poly_model.fit(X_poly, df['Values'])

X_pred = best_poly_features.fit_transform(np.array(range(len(df), len(df) + forecast_horizon)).reshape(-1, 1))

forecast_values = best_poly_model.predict(X_pred)
df['Polynomial'] = best_poly_model.predict(X_poly)

Ml Club

07/18/2024, 6:32 AM

here the PolynomialFeatures is having degree 3

Ml Club

07/19/2024, 4:40 PM

@José Morales Hello Jose, thanks you so much for all your help, really helped me do many things easily. I am stuck with one more issue here. this is my code below. I am trying to run a Linear Model and when i am doing Cross validation i am getting all same values in the Linear Column of Cross Validation table.

Ml Club

07/19/2024, 4:40 PM

Copy code

import pandas as pd
import numpy as np
from utilsforecast.feature_engineering import trend
from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

# sample data
data = pd.read_csv('<https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv>', parse_dates=['ds'])
h = 60

# generate features
train, future = trend(data, freq='MS', h=h)

models ={
    'Linear': LinearRegression()
}
# training
fcst = MLForecast(
    models=models,
    freq='MS',
)
fcst.fit(train, static_features=[], fitted=True)

crossvalidation_df = fcst.cross_validation(
    df=train,
    h=60,
    n_windows=1,
    refit=False,
)
crossvalidation_df.head()

Ml Club

07/19/2024, 4:41 PM

Copy code

unique_id	ds	cutoff	y	Linear
0	AirPassengers	1956-01-01	1955-12-01	284	286.276733
1	AirPassengers	1956-02-01	1955-12-01	277	286.276733
2	AirPassengers	1956-03-01	1955-12-01	317	286.276733
3	AirPassengers	1956-04-01	1955-12-01	313	286.276733
4	AirPassengers	1956-05-01	1955-12-01	318	286.276733

Krystian W.

07/22/2024, 10:42 AM

Is that normal when I try to move DistributedMlForecast to local by calling method .to_local() I get an error -

mlforecast/distributed/forecast.py", line 795, in combine_target_tfms [part[i] for part in by_partition] for i in range(len(by_partition[0])) TypeError: object of type 'NoneType' has no len()

Yaarit Even

07/22/2024, 7:39 PM

Hi, Is there a way to work with the function transform_exog i.e., from mlforecast.feature_engineering import transform_exog in a 3.8 Python environment? I'm getting an error:

Copy code

cannot import name '_parse_transforms' from 'mlforecast.core' (/usr/local/lib/python3.8/site-packages/mlforecast/core.py

Prakash Pandey

07/24/2024, 2:50 PM

Hi team, I have a dataframe where there are 4 columns. - "unique_id", "ds", "feat1", "y". The feature "feat1" is variable & is a defined value based on the item & date (for each current or future date & for a given item/unique_id, feat1 will have some value b/w 1 to n). I tried fitting & predicting but getting an error. Can someone please help me with this? For this test setup, the dataframe has a single item or time-series (all unique_ids are same)

# train has columns as - [unique_id, ds, feat1, y]

fcst.fit(train,

dropna=True,

static_features=['feat1'],

)

predictions = fcst.predict(h=12, X_df=test[['unique_id', 'ds', 'feat1']])

Error -

```ValueError: The following features were provided through
X_df
but were considered as static during fit: ['feat1'].

Please re-run the fit step using the
static_features
argument to indicate which features are static. If all your features are dynamic please pass an empty list (static_features=[]).```