Hi, I am forecasting daily for 90 days and current...
# mlforecast
m
Hi, I am forecasting daily for 90 days and currently wondering if it's possible to create custom lag transform to take last week sales, instead of last 7 days sales. For this week all day, I need to take the last week monday-sunday sales. Is it posisble to use njit to have this? (i.e.24-30 march will have feature of the total sales of 17-23 march)
j
Yes, you can easily do this. I don’t think a Rolling sum is implemented already. But I wonder how useful this is. I guess the rolling mean, which is already implemented is a better signal for future y values, although both should be closely related. So yes, you can do it with numba easily, but it might not be much different from using the rolling mean. What you can also add to the rolling mean is the rolling percentile to give the model more information about the underlying distribution of your window. Maybe this is also an option?
m
Yes, RollingSum isn't implemented, so I need to custom it. Thanks for the suggestion to add RollingMean and RollingPercentile. Yet it's still taking the last
window_size
unit of frequency right? I'm thinking of taking relative weekly statistic instead of daily. Is this not a best pratice?
j
But you need to have one value per row per unique id for that weekly sum. So you would have the same value in there repeating. Mon: sum last week 70, Tuesday: sum last week sales 70, etc. You must figure out the first weekday of your series and let your sliding window move by 7 ( if you start on Monday), right? Does this make sense. Something like this.
m
I guess that is hard to call in custom lag transform, since the only passed data is the target column, right?
j
Yes, and you only have the array and not the df either date. But if you k ow your data always starts with Monday, you can build it based on the timeindex
m
I think if I forecast daily, it is not possible
Thanks for the answer and suggestion!
j
try this one:
Copy code
import numpy as np
from numba import njit

@njit
def weekly_rolling_sum(y: np.ndarray) -> np.ndarray:
    n = len(y)
    result = np.full(n, np.nan)
    for i in range(0, n, 7):
        if i + 7 <= n:
            block_sum = np.sum(y[i:i+7])
            result[i + 6] = block_sum  # store at end of each 7-day block
    return result
assume your data starts on monday
m
That's very great, the code almost caught what I wanted and it really open my eyes on how to design custom lag transform. Thank you for the example!
j
your are welcome