I think the new `forecast_fitted_values` function ...
# mlforecast
i
I think the new
forecast_fitted_values
function in MLForecast is broken. It's producing the transformed predictions rather than the inverse transformed one. I've written a Github issue.
j
Thanks. We'll work on a fix soon
Hey. Looking at this I think it may be best to not apply the inverse transformations at all because some of the transformations can't recover the original values of the target for the training set. For example if you use Differences([24]) the last 24 values of each serie are stored and then added back when predicting, so for the training set this produces wrong values. WDYT? The fix would consist in having the target in the same scale as the model predictions (right now it's in the original scale)
i
I noticed the issues with Differences as well. I'm not sure what a good fix would be, but it seems pretty important to have. Otherwise, I'm not sure how you'd use mlforecast with hierarchicalforecast.
j
Hey. We just released 0.9.3 with a couple of bug fixes related to restoring the fitted values with target_transformations. Please let us know if you run into any more issues, it turned out to be more complex than I expected
👀 1
i
I'll take a look. Thanks for the update!
Do you know when you'll be pushing the new version of hierarchicalforecast? It was last updated in March, so I've been using the git version since then.
j
I have to fix some issue with the aggregation, after that I think we can make a release. I think next week
i
Sweet - it's great how fast and responsive your team is.
@José Morales I'm taking a look at the updated
forecast_fitted_values
and still having some trouble. If I fit MLForecast with
dropna=True
, I get an error because the forecast_fitted_values will have groups that were dropped in training but should be used
Found different number of groups in fitted differences.
, but if I do
dropna=False
then I'm unable to fit, since some of the models can't handle NA values. What do you recommend I do?
j
Damn haha. Can you provide an example? I'm guessing this happens for series shorter than the max difference, right?
i
Yep. I think that's the case
Let me come up with some example data
j
Can you try removing those series? They'd be removed from the training anyway and I think the fitted values takes this into account, so it wouldn't compute predictions for them
i
Should I put the example in github or post it here?
j
Either one is fine, whichever you prefer
i
Copy code
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer
from sklearn.linear_model import LinearRegression
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
from mlforecast.target_transforms import Differences, LocalStandardScaler
from mlforecast.target_transforms import GlobalSklearnTransformer
from window_ops.rolling import rolling_mean


horizon = 10

# Create a basic series
df = generate_daily_series(
    n_series=10,
    min_length=10,
    max_length=25,
    equal_ends=True,
)
df.groupby(['unique_id'])['ds'].agg(['min', 'max'])

test_end = df['ds'].max()
train_end = test_end - pd.Timedelta(days=10)

train_df = (
    df
    .query("ds <= @train_end")
)


#
fcst = MLForecast(
    models={'lr': LinearRegression()},
    freq='D',
    lags=range(1, 7),
    lag_transforms={
        i: [(rolling_mean, 3), (rolling_mean, 4)]
        for i in range(1, 7)
    },
    target_transforms=[Differences([1])],
    date_features=['day', 'dayofweek', 'week', 'month', 'quarter', 'year'],
)

# THIS WILL FAIL WHEN FITTING THE FORECASTED VALUE
fcst.fit(
    df,
    fitted=True,
    dropna=True,
)

# THIS WILL FAIL WHEN TRAINING
fcst.fit(
    df,
    fitted=True,
    dropna=False,
)
If I remove the series with NA's how would you go about reconciling them after the fact? I need
forecast_fitted_values
for
aggregate
in hierarchicalforecast.
j
In order to unblock you at least partially, is it ok to impute null values for you?
If you use a pipeline as a model that imputes first you could use linear regression and use dropna=False
Copy code
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline

lr = make_pipeline(SimpleImputer(strategy='constant', fill_value=0), LinearRegression())
fcst = MLForecast(
    models={'lr': lr},
    ...
i
I don't mind trying that at all, but it looks like it'll fail because it doesn't impute
y
.
Copy code
lr = make_pipeline(SimpleImputer(strategy='constant', fill_value=0), LinearRegression())
fcst = MLForecast(
    models={'lr': lr},
    freq='D',
    lags=range(1, 7),
    lag_transforms={
        i: [(rolling_mean, 3), (rolling_mean, 4)]
        for i in range(1, 7)
    },
    target_transforms=[Differences([1])],
    date_features=['day', 'dayofweek', 'week', 'month', 'quarter', 'year'],
)
fcst.fit(
    df,
    fitted=True,
    dropna=False,
)
j
Ah, right. My bad. Let me take a look at a possible fix
gratitude thank you 1
In the meantime you can use an approach like here and manually impute X and y
In that example I call
MLForecast.ts.predict
but you can also override the
MLForecast.models_
attribute with a dict from str to model.
Nevermind that, I think that won't work either for the fitted values haha. Sorry
i
How about removing the offending series from MlForecast at all. Then when it comes to
aggregate
filling in the missing series with the fitted values and the predicted values from
statsforecast.HistoricAverage
? It may mess with reconciliation, but since we have so little data about the missing series it may help nevertheless
j
Yeah sure, if you can model those series in some other way that'd work
i
It'll mess with the reconcile, but code-wise it should work. Should I write a github issue to track the issue going forward?
j
Yes, please.
j
So you need to have the fitted values even if the data wasn't actually seen by the model, right?
i
Let me take a step back and describe my problem. I have N times series I need to predict. They follow a hierarchical set-up: A -> B -> C ->D. (Separately, is there a good way to include A, B, C, D as features in MLForecast rather than just as unique id?) They all share an end date, but have different start dates. I'd like to predict all of them using statsforecast and mlforecast, reconcile them with hierarchicalforecast and ensemble the best predictions per time series. Therefore, I would need the fitted values for all the time series. Is there anything I could be doing to avoid the issue?
j
You can add them as static features. Another thing you could try with the short series problem is adding leading zeros or backfilling to get the minimum number of samples
I'm going to work on trying to raise an error sooner for those cases but I don't think there's much that can be done if we aren't able to get a single row with all the features to predict
i
Addressing You can add them as static features: How do you go about adding static features with a hierarchy. Given the unique id is A/B/C, we have rows that are A, A/B and A/B/C, what values do B and C get for the A rows?
Addressing trying to raise an error: Is there a default behavior you can put in? For instance, if a unique_id will be totally dropped, default to historic average so at least we'll have results for all unique_id's?
j
I think you could try giving the rows that only have A the same value for B, could be like "B", "Aggregate" or something like that
i
That's a good idea. It'd be great to automatically have that handled in
aggregate
. Would it help if I added a github issue proposing that?
Just to be clear on what you're suggesting, would it look something like this (assuming a strict hierarchy of Country -> State -> Purpose):
Made with this function:
Copy code
from sklearn.preprocessing import OneHotEncoder
from scipy import sparse
from typing import Callable, Dict, List, Optional, Iterable

def aggregate2(
    df: pd.DataFrame,
    spec: List[List[str]],
    is_balanced: bool = False,
    sparse_s: bool = False,
):
    """Utils Aggregation Function.
    Aggregates bottom level series contained in the pandas DataFrame `df` according
    to levels defined in the `spec` list.

    Parameters
    ----------
    df : pandas DataFrame
        Dataframe with columns `['ds', 'y']` and columns to aggregate.
    spec : list of list of str
        List of levels. Each element of the list should contain a list of columns of `df` to aggregate.
    is_balanced : bool (default=False)
        Deprecated.
    sparse_s : bool (default=False)
        Return `S_df` as a sparse dataframe.

    Returns
    -------
    Y_df : pandas DataFrame
        Hierarchically structured series.
    S_df : pandas DataFrame
        Summing dataframe.
    tags : dict
        Aggregation indices.
    """
    # Checks
    if df.isnull().values.any():
        raise ValueError('`df` contains null values')
    if is_balanced:
        warnings.warn(
            "`is_balanced` is deprecated and will be removed in a future version. "
            "Don't set this argument to suppress this warning.",
            category=DeprecationWarning,
        )
    spec = sorted(spec, key=len)
    all_spec = set()
    all_spec = [x for x in [item for sublist in spec for item in sublist] if not (x in all_spec or all_spec.add(x))]
    all_dummies = [f'{i}_{j}' for i in all_spec for j in df[i].unique()]
    all_agg = [f'agg_{i}' for i in all_spec]
    bottom = spec[-1]
    aggs = []
    tags = {}
    for i, levels in enumerate(spec):
        agg = df.groupby(levels + ['ds'])['y'].sum().reset_index('ds')
        group = agg.index.get_level_values(0)
        agg[levels[0]] = agg.index.get_level_values(levels[0]).values
        for j, level in enumerate(levels):
            if j > 0:
                group = group + '/' + agg.index.get_level_values(level).str.replace('/', '_')
            agg[level] = agg.index.get_level_values(level).values
            agg = pd.concat([agg, pd.get_dummies(agg[level], prefix=level, dtype=int)], axis=1)
        agg.index = group
        agg.index.name = 'unique_id'
        tags['/'.join(levels)] = group.unique().values

        for j in all_spec:
            if j not in agg:
                agg[j] = np.NaN
            agg[f'agg_{j}'] = agg[j].isna().astype(int)

        for j in all_dummies:
            if j not in agg:
                agg[j] = 0
        aggs.append(agg)

    Y_df = (
        pd.concat(aggs)
        [['ds', 'y'] + all_spec + all_dummies + all_agg]
    )

    # construct S
    bottom_key = '/'.join(bottom)
    bottom_levels = tags[bottom_key]
    S = np.empty((len(bottom_levels), len(spec)), dtype=object)
    for j, levels in enumerate(spec[:-1]):
        S[:, j] = _to_upper_hierarchy(bottom, bottom_levels, '/'.join(levels))
    S[:, -1] = tags[bottom_key]
    categories = list(tags.values())
    try:
        encoder = OneHotEncoder(categories=categories, sparse_output=sparse_s, dtype=np.float32)
    except TypeError:  # sklearn < 1.2
        encoder = OneHotEncoder(categories=categories, sparse=sparse_s, dtype=np.float32)    
    S = encoder.fit_transform(S).T
    if sparse_s:
        df_constructor = pd.DataFrame.sparse.from_spmatrix
    else:
        df_constructor = pd.DataFrame
    S_df = df_constructor(S, index=np.hstack(categories), columns=bottom_levels)
    return Y_df, S_df, tags
j
Is that for the dummies?
i
Yep yep
j
I was thinking using strsplit and then maybe a fillna. For example:
Copy code
Y_df, S_df, tags = aggregate(df, spec)
Y_df = Y_df.reset_index()
Y_df[['country', 'cat1', 'cat2']] = Y_df['unique_id'].str.split('/', n=3, expand=True)
for col in ('cat1', 'cat2'):
    Y_df[col] = Y_df[col].fillna(col)
You could then use these directly in LightGBM or use pd.get_dummies or a one hot encoder
Oh btw you were interested in a new release of hierarchicalforecast, right? We released 0.4.0 yesterday
🙌 1
i
I added an enhancement idea to github. The code I wrote is slightly more complex because with non-strictly hierarchical datasets, you're not guaranteed to get the same variable in 'cat1' every time. I included the Tourism example in my issue.
👍 1
Let me know what you think