Daylight savings Time index Hi everyone I was just wonderin Nixtla Community #general

Daylight savings & Time-index Hi everyone, I was...

Quang Bui

04/14/2024, 5:33 AM

Daylight savings & Time-index Hi everyone, I was just wondering how one might go about creating a five-minutely time index that is based on the clock-datetime (so it accounts for daylight savings) instead of the UTC time (what I have in my

ds

column)? I've used the function below, which works fine to create the five-minutely time index based on the UTC time (

ds

column):

Copy code

def five_min_index(dates):
    """Calculate 5-minutely index for each datetime (0 to 287)"""
    return (dates.hour * 60 + dates.minute) // 5

It is used in

LightGBMCV()

as follows:

Copy code

cv = LightGBMCV(
    freq='5min',
    target_transforms=[Differences([288])],
    lags=[1,2,3,4,5,6,12,288],
    lag_transforms={
        1: [
            ExponentiallyWeightedMean(alpha=0.5), 
            RollingMean(window_size=12), 
        ],
        12: [RollingMean(window_size=288)],
    },
    date_features=[five_min_index, 'hour', 'dayofweek'],
    num_threads=4,
)

depends on the clock-datetime, it's human driven, so is influenced by when we start the day and when we end the day. Any thoughts? I'd like to be able to have the index in

date_features

instead of having to create a dynamic exogenous feature...

José Morales

04/15/2024, 4:39 PM

Hey. The ds column can also have integers, would that help you here?

Quang Bui

04/16/2024, 3:20 AM

Thanks @José Morales! What I ended up doing instead is writing up a custom function to put in

date_features

. Here is the function:

Copy code

def localise_five_min_index(dates):
    if isinstance(dates, pd.DatetimeIndex):
        localised_dates = dates.tz_localize('Australia/Adelaide', ambiguous='NaT', nonexistent='shift_forward')
    elif isinstance(dates, pd.Series) and pd.api.types.is_datetime64_any_dtype(dates):
        localised_dates = dates.dt.tz_localize('Australia/Adelaide', ambiguous='NaT', nonexistent='shift_forward')
    else:
        raise ValueError("Input must be a pandas DatetimeIndex or datetime64 Series.")
    local_five_min_index = (localised_dates.hour * 60 + localised_dates.minute) // 5
    
    return local_five_min_index

And then I include in it

LightGBMCV()

like so:

Copy code

cv = LightGBMCV(
    freq='5min',
    target_transforms=[Differences([288])],
    lags=[1,2,3,4,5,6,12,288],
    lag_transforms={
        1: [
            ExponentiallyWeightedMean(alpha=0.5), 
            RollingMean(window_size=12), 
        ],
        12: [RollingMean(window_size=288)],
    },
    date_features=[localise_five_min_index, 'hour', 'dayofweek'],
    num_threads=4,
)

This resulted in reduced errors during cross-validation (as expected) :)

Quang Bui

04/16/2024, 3:46 AM

I do have another question about including exogenous features though. I'm trying to forecasting household energy consumption, and I have exogenous features like air temperature and humidity, which have been merged into the data. The data has columns:

unique_id

ds

temp

relative_humidity

is each household (

unique_id

)'s energy consumption, and the temperature and humidity comes from one weather station, so it is the same for each household. Here, temperature and humidity changes with

ds

it varies depending on the time of the day. I've estimated the correlation coefficients between

temp

and

as well as

relative_humidity

and

and also ran test of statistical signicance, and all correlations are statistically significantly different from 0. I do this as one of the steps to ensure that all the data are merged correctly and also that there isn't anything wrong with

temp

and

relative_humidity

. When I train the model with

temp

and

relative_humidity

, it performs worse than without it. This should not be happening, but I cannot figure out why it is happening...

Copy code

cv = LightGBMCV(
    freq='5min',
    target_transforms=[Differences([288]), LocalStandardScaler()],
    lags=[1,2,3,4,5,6,12,288],
    lag_transforms={
        1: [
            ExponentiallyWeightedMean(alpha=0.5), 
            RollingMean(window_size=12), 
            RollingMean(window_size=288),
            RollingMean(window_size=864),# 3 days
            RollingQuantile(window_size=12, p=0.5),
            RollingQuantile(window_size=288, p=0.5),
            RollingQuantile(window_size=864, p=0.5),
            RollingStd(window_size=12),
            RollingStd(window_size=288),
        ],
        12: [RollingMean(window_size=288)],
        24: [RollingMean(window_size=288)],
    },
    date_features=[localize_five_min_index, localize_hour, localize_and_identify_weekend, localize_dayofweek],
    num_threads=4,
)
cv_hist = cv.fit(
    df_filtered_subsample_with_weather,
    n_windows=4,
    h=288,
    params=lgb_params,
    eval_every=5,
    early_stopping_evals=5,    
    compute_cv_preds=True,
    metric = 'rmse',
    static_features=[]
)

Here is

df_filtered_subsample_with_weather

is the for cross-validation that has columns

unique_id

ds

temp

relative_humidity

. Any help would be much appreciated!

José Morales

04/16/2024, 4:36 PM

Hmm, it's kind of hard to tell without looking at the data, it's possible that if the features are too good it starts to overfit, are you seeing that it stops earlier? Also, the models are saved in

<http://cv.cv|cv.cv>_models_

, can you inspect the feature importance with and without using those exog features?

Joaquin FERNANDEZ

06/04/2024, 1:53 PM

@Quang Bui have you figured out why? I am seeing similar performance, also in household forecast. Best

Quang Bui

06/07/2024, 1:54 AM

Hi @Joaquin FERNANDEZ, I did see some improvements eventually, but they don't increase accuracy as much as features that are just transformations of load.

Quang Bui

06/07/2024, 1:56 AM

A lot of the increased accuracy gains happened when I revisited the data to perform further cleaning on each household's load power

Quang Bui

06/07/2024, 1:58 AM

Why adding weather features worsen the model to begin with doesn't make sense to me

Joaquin FERNANDEZ

06/17/2024, 7:17 AM

Thanks for the answers @Quang Bui. Can you give me some hints about the data cleaning you did?

3 Views

Open in Slack

Previous Next