https://github.com/nixtla logo
#general
Title
# general
a

Arsa Nikzad

03/17/2023, 1:12 AM
Hi guys, a quick question/suggestion. In
mlforecast
in
fcst.preprocess()
, the default value for
dropna
is True, so in cases when we have intermittent series and we generate features like
rolling_std
, NAs might occur anywhere in data (not just at initial rows) and we are dropping all those observations. the consequence might be inaccurate CV estimate when we try to evaluate the model. May be a warning in the function to notify the user about the default value of
dropna
?
m

Max (Nixtla)

03/17/2023, 1:08 PM
Thanks for the comment. Cc @José Morales
j

José Morales

03/17/2023, 5:10 PM
Hey @Arsa Nikzad. I think that if you have any null values in your target you get an error when trying to run the preprocess because all transformations propagate the null values, were you able to run it with missing values in the middle?
a

Arsa Nikzad

03/17/2023, 6:09 PM
Hi @José Morales. thanks for the reply. the target does not have null values. the issue is with
rolling_std
when we have a set of consecutive zeros larger than window size in target. the root cause of issue seems to be
_rolling_std
in
window_ops.rolling
where it generates large negative numbers in above situation and these negative numbers are then converted to NAN. attached is an example.
Copy code
data = pd.DataFrame({
    'date': pd.date_range(start='2019-01-01', end='2020-12-31', freq='MS'),
    'sprid': 1.,
    'target': [1., 2., 0., 4., 0., 0., 0., 0., 9., 10., 11., 12] * 2
})

models = [lgb.LGBMRegressor(**{})]
fcst = MLForecast(
    models=models,
    freq='MS',
    lags=[1],
    lag_transforms={
        1: [(rolling_std, 3)]
    }
)

preprocessed_df = fcst.preprocess(data, id_col='sprid', time_col='date', target_col='target', dropna=False)
print(preprocessed_df)

## check _rolling_std
from window_ops.rolling import  _rolling_std
a = np.array([1, 2, 0, 4, 0, 0, 0, 0, 9, 10, 11, 12] * 2)
print(_rolling_std(a, 3))
j

José Morales

03/17/2023, 6:33 PM
Thanks for the example! I think a warning for these cases is definitely useful. Will add it soon
👍 1
a

Arsa Nikzad

03/17/2023, 6:36 PM
additionally,
rolling_std
should generate zeros instead of NAN for these cases.
m

Max (Nixtla)

03/17/2023, 6:37 PM
Hi @Arsa Nikzad! Happy to work on that. Would you mind opening an issue for that on GH. Here is the link.
a

Arsa Nikzad

03/17/2023, 6:38 PM
Sure, will do!
j

José Morales

03/22/2023, 3:36 AM
Hey @Arsa Nikzad, we just pushed window-ops 0.0.14 which should fix this issue with the rolling_std
a

Arsa Nikzad

03/22/2023, 1:08 PM
Thank you.