jan rathfelder
06/19/2024, 1:17 PMOlgahan Cat
06/19/2024, 3:55 PMBiagio Principe
06/25/2024, 4:00 PMOlgahan Cat
06/25/2024, 6:37 PMSarim Zafar
06/26/2024, 8:48 AMVítor Barbosa
06/26/2024, 10:38 PMfill_gaps
here:
from utilsforecast.preprocessing import fill_gaps
stocks_basic_pd = fill_gaps(stocks_basic_pd, freq='B', start='per_serie', end='per_serie', id_col='Ticker', time_col='Date')
I am getting the error below. Any ideas?
{
"name": "ValueError",
"message": "cannot handle a non-unique multi-index!",
"stack": "---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[54], line 2
1 from utilsforecast.preprocessing import fill_gaps
----> 2 stocks_basic_pd = fill_gaps(stocks_basic_pd, freq='B', start='per_serie', end='per_serie', id_col='Ticker', time_col='Date')
File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\utilsforecast\\preprocessing.py:166, in fill_gaps(df, freq, start, end, id_col, time_col)
164 times += offset.base
165 idx = pd.MultiIndex.from_arrays([uids, times], names=[id_col, time_col])
--> 166 res = df.set_index([id_col, time_col]).reindex(idx).reset_index()
167 extra_cols = df.columns.drop([id_col, time_col]).tolist()
168 if extra_cols:
File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\frame.py:5365, in DataFrame.reindex(self, labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
5346 @doc(
5347 NDFrame.reindex,
5348 klass=_shared_doc_kwargs[\"klass\"],
(...)
5363 tolerance=None,
5364 ) -> DataFrame:
-> 5365 return super().reindex(
5366 labels=labels,
5367 index=index,
5368 columns=columns,
5369 axis=axis,
5370 method=method,
5371 copy=copy,
5372 level=level,
5373 fill_value=fill_value,
5374 limit=limit,
5375 tolerance=tolerance,
5376 )
File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\generic.py:5607, in NDFrame.reindex(self, labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
5604 return self._reindex_multi(axes, copy, fill_value)
5606 # perform the reindex on the axes
-> 5607 return self._reindex_axes(
5608 axes, level, limit, tolerance, method, fill_value, copy
5609 ).__finalize__(self, method=\"reindex\")
File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\generic.py:5630, in NDFrame._reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
5627 continue
5629 ax = self._get_axis(a)
-> 5630 new_index, indexer = ax.reindex(
5631 labels, level=level, limit=limit, tolerance=tolerance, method=method
5632 )
5634 axis = self._get_axis_number(a)
5635 obj = obj._reindex_with_indexers(
5636 {axis: [new_index, indexer]},
5637 fill_value=fill_value,
5638 copy=copy,
5639 allow_dups=False,
5640 )
File c:\\Python\\miniconda3\\envs\\openbb\\Lib\\site-packages\\pandas\\core\\indexes\\base.py:4426, in Index.reindex(self, target, method, level, limit, tolerance)
4422 indexer = self.get_indexer(
4423 target, method=method, limit=limit, tolerance=tolerance
4424 )
4425 elif self._is_multi:
-> 4426 raise ValueError(\"cannot handle a non-unique multi-index!\")
4427 elif not self.is_unique:
4428 # GH#42568
4429 raise ValueError(\"cannot reindex on an axis with duplicate labels\")
ValueError: cannot handle a non-unique multi-index!"
}
Johannes Emme
06/29/2024, 2:46 PMcs_df
and the true target plotted against each other. From this plot, it can be seen that my model is okay at predicting the weekends but has clear difficulties in predicting the Mondays.
However, when I used the model for predictions (see plot 2), the uncertainty for the weekends was very large, and the Mondays had small uncertainty. (In the plot2 I have forgotten legends: black = true, blue = mean prediction, purple = 10th and 90th percentiles)
What I have come to realize is that the problem arises from a misalignment between the conformal horizon and the horizon of when I am predicting. With a conformal horizon of 96, the errors collected for a specific timestep are not “belonging to the same timeslot.” For instance, the first error in the first window corresponds to Monday 00:00, while for the next window, the first hour is Friday 00:00, then Tuesday 00:00, and so on. Hence, when I predict the consumption during Saturday, the quantiles are based on several different days and hours and not “Saturday hour errors.”
To overcome this issue, I set the conformal horizon to 24*7 (168) so that my conformal windows start with the same day as when I am predicting. Then I get the following result (see plot 3 and 4), where the uncertainty is low for the weekends and high for the Mondays. However, I do not believe this is a sustainable solution. Unfortunately, I don't have a very great alternative either. Currently, I have simply for my case rewritten the _add_conformal_distribution_intervals
function by:
1. Requiring that n_windows*h >= 168 to have all hours in the week represented.
2. Joining the cs_df
and fcst_df
on day_of_week
and hour
.
3. Subtracting and adding the mean to get a distribution around each hour, and then calculating the quantiles
I am very curious to hear your thoughts on this.
Best regards,
Johannesjan rathfelder
07/01/2024, 10:10 AMUserWarning: Found null values in expanding_std_lag1, rolling_std_lag1_window_size7_min_samples1, rolling_std_lag1_window_size70_min_samples1, rolling_std_lag1_window_size105_min_samples1, seasonal_rolling_std_lag1_season_length7_window_size3_min_samples1.
warnings.warn(f'Found null values in {", ".join(cols_with_nulls)}.')
Krystian W.
07/02/2024, 10:13 PMmax_horizon
arg in DistributedMlForecast? Or only through some workaround?Biagio Principe
07/03/2024, 7:52 PMscaler = TemporalNorm(scaler_type='standard', dim=1)
Dinis Timoteo
07/04/2024, 2:51 PMKrystian W.
07/07/2024, 7:32 PMcv = fcst.cross_validation(
spark_train_df,
n_windows=n_windows,
h=h,
static_features=[],
)
I tried to run cv.show() but I keep getting a KeyError that these features aren't found in index. On local works just fine.Ml Club
07/09/2024, 8:43 AM888 elif pandas_requires_conversion and any(d == object for d in dtypes_orig):
889 # Force object if any of the dtypes is an object
890 dtype_orig = object
ValueError: at least one array or dtype is required
Ml Club
07/09/2024, 8:46 AMlag=[1]
then it works great. what is the issue please help me resolve. Also i want to do a target transformation of np.log, How can i do that ?
Ml Club
07/11/2024, 7:56 AMMl Club
07/12/2024, 6:17 AMKrystian W.
07/14/2024, 2:06 PMrolling_quantile_lag_1_p=0.5_window_size_7
because of the dot in the parameter.Biagio Principe
07/15/2024, 8:35 AMmax_horizon
with lag 1
introduce data leakage? (see second image)
Grazie mille!Ml Club
07/16/2024, 4:19 PMMl Club
07/16/2024, 4:24 PMmodel = LinearRegression()
model.fit(np.log(np.array(range(1,len(df)+1)).reshape(-1, 1)), np.log(df['Values'].values+1))
timestamps = pd.date_range(datetime.strptime(t['Timestamp'].values[-1],'%m-%d-%Y'), periods=forecast_horizon+1, freq='MS')
timestamps = timestamps[1:]
temp = pd.DataFrame()
temp['Timestamp'] = timestamps
forecasts = model.predict(np.log(np.array(range(len(df), len(df) + forecast_horizon)).reshape(-1, 1)))
forecast_values = np.exp(forecasts)-1
df['Power'] = np.exp(model.predict(np.log(np.array(range(1,len(df)+1)).reshape(-1, 1))))-1
Guillaume GALIE
07/17/2024, 6:29 AMMl Club
07/18/2024, 6:28 AMMl Club
07/18/2024, 6:28 AMfrom sklearn.preprocessing import PolynomialFeatures
best_poly_features = PolynomialFeatures(degree=3)
X_poly = best_poly_features.fit_transform(np.array(range(len(df))).reshape(-1, 1))
best_poly_model = LinearRegression()
best_poly_model.fit(X_poly, df['Values'])
X_pred = best_poly_features.fit_transform(np.array(range(len(df), len(df) + forecast_horizon)).reshape(-1, 1))
forecast_values = best_poly_model.predict(X_pred)
df['Polynomial'] = best_poly_model.predict(X_poly)
Ml Club
07/18/2024, 6:32 AMMl Club
07/19/2024, 4:40 PMMl Club
07/19/2024, 4:40 PMimport pandas as pd
import numpy as np
from utilsforecast.feature_engineering import trend
from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression
# sample data
data = pd.read_csv('<https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv>', parse_dates=['ds'])
h = 60
# generate features
train, future = trend(data, freq='MS', h=h)
models ={
'Linear': LinearRegression()
}
# training
fcst = MLForecast(
models=models,
freq='MS',
)
fcst.fit(train, static_features=[], fitted=True)
crossvalidation_df = fcst.cross_validation(
df=train,
h=60,
n_windows=1,
refit=False,
)
crossvalidation_df.head()
Ml Club
07/19/2024, 4:41 PMunique_id ds cutoff y Linear
0 AirPassengers 1956-01-01 1955-12-01 284 286.276733
1 AirPassengers 1956-02-01 1955-12-01 277 286.276733
2 AirPassengers 1956-03-01 1955-12-01 317 286.276733
3 AirPassengers 1956-04-01 1955-12-01 313 286.276733
4 AirPassengers 1956-05-01 1955-12-01 318 286.276733
Krystian W.
07/22/2024, 10:42 AMmlforecast/distributed/forecast.py", line 795, in combine_target_tfms [part[i] for part in by_partition] for i in range(len(by_partition[0])) TypeError: object of type 'NoneType' has no len()
Yaarit Even
07/22/2024, 7:39 PMcannot import name '_parse_transforms' from 'mlforecast.core' (/usr/local/lib/python3.8/site-packages/mlforecast/core.py
Prakash Pandey
07/24/2024, 2:50 PM# train has columns as - [unique_id, ds, feat1, y]
fcst.fit(train,
dropna=True,
static_features=['feat1'],
)
predictions = fcst.predict(h=12, X_df=test[['unique_id', 'ds', 'feat1']])
Error -
```ValueError: The following features were provided throughbut were considered as static during fit: ['feat1'].X_df
Please re-run the fit step using theargument to indicate which features are static. If all your features are dynamic please pass an empty list (static_features=[]).```static_features