Quang Bui
04/14/2024, 5:33 AMds
column)?
I've used the function below, which works fine to create the five-minutely time index based on the UTC time (ds
column):
def five_min_index(dates):
"""Calculate 5-minutely index for each datetime (0 to 287)"""
return (dates.hour * 60 + dates.minute) // 5
It is used in LightGBMCV()
as follows:
cv = LightGBMCV(
freq='5min',
target_transforms=[Differences([288])],
lags=[1,2,3,4,5,6,12,288],
lag_transforms={
1: [
ExponentiallyWeightedMean(alpha=0.5),
RollingMean(window_size=12),
],
12: [RollingMean(window_size=288)],
},
date_features=[five_min_index, 'hour', 'dayofweek'],
num_threads=4,
)
My y
depends on the clock-datetime, it's human driven, so is influenced by when we start the day and when we end the day.
Any thoughts? I'd like to be able to have the index in date_features
instead of having to create a dynamic exogenous feature...José Morales
04/15/2024, 4:39 PMQuang Bui
04/16/2024, 3:20 AMdate_features
. Here is the function:
def localise_five_min_index(dates):
if isinstance(dates, pd.DatetimeIndex):
localised_dates = dates.tz_localize('Australia/Adelaide', ambiguous='NaT', nonexistent='shift_forward')
elif isinstance(dates, pd.Series) and pd.api.types.is_datetime64_any_dtype(dates):
localised_dates = dates.dt.tz_localize('Australia/Adelaide', ambiguous='NaT', nonexistent='shift_forward')
else:
raise ValueError("Input must be a pandas DatetimeIndex or datetime64 Series.")
local_five_min_index = (localised_dates.hour * 60 + localised_dates.minute) // 5
return local_five_min_index
And then I include in it LightGBMCV()
like so:
cv = LightGBMCV(
freq='5min',
target_transforms=[Differences([288])],
lags=[1,2,3,4,5,6,12,288],
lag_transforms={
1: [
ExponentiallyWeightedMean(alpha=0.5),
RollingMean(window_size=12),
],
12: [RollingMean(window_size=288)],
},
date_features=[localise_five_min_index, 'hour', 'dayofweek'],
num_threads=4,
)
This resulted in reduced errors during cross-validation (as expected) :)Quang Bui
04/16/2024, 3:46 AMunique_id
, ds
, y
, temp
, relative_humidity
y
is each household (unique_id
)'s energy consumption, and the temperature and humidity comes from one weather station, so it is the same for each household. Here, temperature and humidity changes with ds
it varies depending on the time of the day.
I've estimated the correlation coefficients between temp
and y
as well as relative_humidity
and y
and also ran test of statistical signicance, and all correlations are statistically significantly different from 0. I do this as one of the steps to ensure that all the data are merged correctly and also that there isn't anything wrong with temp
and relative_humidity
.
When I train the model with temp
and relative_humidity
, it performs worse than without it. This should not be happening, but I cannot figure out why it is happening...
cv = LightGBMCV(
freq='5min',
target_transforms=[Differences([288]), LocalStandardScaler()],
lags=[1,2,3,4,5,6,12,288],
lag_transforms={
1: [
ExponentiallyWeightedMean(alpha=0.5),
RollingMean(window_size=12),
RollingMean(window_size=288),
RollingMean(window_size=864),# 3 days
RollingQuantile(window_size=12, p=0.5),
RollingQuantile(window_size=288, p=0.5),
RollingQuantile(window_size=864, p=0.5),
RollingStd(window_size=12),
RollingStd(window_size=288),
],
12: [RollingMean(window_size=288)],
24: [RollingMean(window_size=288)],
},
date_features=[localize_five_min_index, localize_hour, localize_and_identify_weekend, localize_dayofweek],
num_threads=4,
)
cv_hist = cv.fit(
df_filtered_subsample_with_weather,
n_windows=4,
h=288,
params=lgb_params,
eval_every=5,
early_stopping_evals=5,
compute_cv_preds=True,
metric = 'rmse',
static_features=[]
)
Here is df_filtered_subsample_with_weather
is the for cross-validation that has columns unique_id
, ds
, y
, temp
, relative_humidity
.
Any help would be much appreciated!José Morales
04/16/2024, 4:36 PM<http://cv.cv|cv.cv>_models_
, can you inspect the feature importance with and without using those exog features?Joaquin FERNANDEZ
06/04/2024, 1:53 PMQuang Bui
06/07/2024, 1:54 AMQuang Bui
06/07/2024, 1:56 AMQuang Bui
06/07/2024, 1:58 AMJoaquin FERNANDEZ
06/17/2024, 7:17 AM