Good morning I m probably missing one small step but I m wor Nixtla Community #timegpt

Good morning, I'm probably missing one small step,...

Tony Dunsworth

08/14/2024, 4:06 PM

Good morning, I'm probably missing one small step, but I'm working on adding US holidays to a TimeGPT forecast (Actually, by the time I'm done, it will be 12) for my dissertation. I'll include all my setup code below. It all functions properly in the notebook, but when I get to the forecast, I get this error: "You must include the exogenous variables in the
df
object, exogenous variables y,US_New Year's Day,US_New Year's Day (observed),US_Memorial Day,US_Juneteenth National Independence Day,US_Independence Day,US_Labor Day,US_Veterans Day,US_Veterans Day (observed),US_Thanksgiving,US_Christmas Day,US_Martin Luther King Jr. Day,US_Washington's Birthday,US_Columbus Day"

Copy code

from nixtla.date_features import CountryHolidays

us_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tst_alxmsf.iloc[0]['ds'], end=tst_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()

monthly_holidays = monthly_holidays.reset_index(names='ds')
monthly_holidays['ds'] = pd.to_datetime(monthly_holidays['ds']) 
tst_alxmsf['ds'] = pd.to_datetime(tst_alxmsf['ds'])

tst_alxmsf = tst_alxmsf.merge(monthly_holidays)

tst_alxmsf.head()

Copy code

dates = pd.date_range(start=tr_alxmsf.iloc[0]['ds'], end=tr_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()

monthly_holidays = monthly_holidays.reset_index(names='ds')
monthly_holidays['ds'] = pd.to_datetime(monthly_holidays['ds'])
tr_alxmsf['ds'] = pd.to_datetime(tr_alxmsf['ds'])

tr_alxmsf = tr_alxmsf.merge(monthly_holidays)

Copy code

alxm_fcst4 = nixtla_client.forecast(
    df = tr_alxmsf,
    id_col = 'unique_id',
    time_col = 'ds',
    target_col='y',
    h = 12,
    freq = 'MS',
    level = [90,95],
    X_df = tst_alxmsf,
)

alxm_fcst4_pred = alxm_fcst4['TimeGPT']

The first two code blocks run with no problems. When I checked the datasets, they both had the merged columns where they should, though I was surprised that in the tail of tr_alxmsf, the month of December did not have Christmas marked as 1 where I expected it. However, that is not as important at this point. When I ran the forecast code, I received the error; I identified the df (I'm assuming, and this could be my problem) that this is where the evaluation set goes since X_df is the future dataset (tst_alxmsf). As a note, tr indicates that it is a training set, tst is the test set, and sf simply means I adjusted the dataset for statsforecast that I used earlier in the notebook. If anyone could, after reading all my posts, point me in the right direction to fix what I have done wrong, I'd appreciate it. Thanks in advance for any help offered.

Marco

08/14/2024, 4:10 PM

To use exogenous variables, both your input dataframe and future dataframe must have them. You can refer to this tutorial, which adds US holidays to an existing dataset, we create a future dataframe, and use TimeGPT to forecast our target and also mesure the importance of each holiday.

Tony Dunsworth

08/14/2024, 4:33 PM

Thank you. You just told me what my problem is because my testing set, which becomes the future dataset, already has values in it; if I'm right, the 'y' variable there is what the function doesn't like. So I will drop the values from a copy of the set and then I can use it with no problems.

Tony Dunsworth

08/14/2024, 4:44 PM

Well, that didn't work as well as I would like. The error changed. It removed the y, but it failed still stating that I need to add the exogenous variables. I did notice that when I made a list of the holidays in the future dataset and the training dataset, they are different although they are supposed to be using the same

Copy code

us_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tr_alxmsf.iloc[0]['ds'], end=tr_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()

Tony Dunsworth

08/14/2024, 4:48 PM

The first part is the training set code block. The following is the future dataset code block:

Copy code

us_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tst_alxmsf.iloc[0]['ds'], end=tst_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()
tst_alxmsf2 = tst_alxmsf.drop(['y'], axis=1)

The lists for the two sets are below:

Copy code

Training: 
['unique_id',
 'ds',
 'y',
 "US_New Year's Day",
 'US_Memorial Day',
 'US_Independence Day',
 'US_Labor Day',
 'US_Veterans Day',
 'US_Thanksgiving',
 'US_Christmas Day',
 'US_Martin Luther King Jr. Day',
 "US_Washington's Birthday",
 'US_Columbus Day',
 'US_Independence Day (observed)',
 "US_New Year's Day (observed)",
 'US_Juneteenth National Independence Day',
 'US_Juneteenth National Independence Day (observed)',
 'US_Christmas Day (observed)']
Future set:
['unique_id',
 'ds',
 "US_New Year's Day",
 "US_New Year's Day (observed)",
 'US_Memorial Day',
 'US_Juneteenth National Independence Day',
 'US_Independence Day',
 'US_Labor Day',
 'US_Veterans Day',
 'US_Veterans Day (observed)',
 'US_Thanksgiving',
 'US_Christmas Day',
 'US_Martin Luther King Jr. Day',
 "US_Washington's Birthday",
 'US_Columbus Day']

So it looks like there are inconsistencies that I might have to add by hand?

Marco

08/14/2024, 5:45 PM

Are there other exogenous variables, apart from holidays?

Tony Dunsworth

08/14/2024, 5:46 PM

No sir. The holidays are my only exogenous variable for this exercise.

Tony Dunsworth

08/14/2024, 5:52 PM

Here's an off-the-wall question: Could the list order be the problem?

Marco

08/14/2024, 5:55 PM

Are you using nixtla version 0.5.2?

Tony Dunsworth

08/14/2024, 6:04 PM

Yes. I'm using version 0.5.2 with Python 3.11.9 and I'm working in Posit's Position IDE 2024.08.0 build 24

Marco

08/14/2024, 6:43 PM

Can you copy paste the error please? As it is returned

Tony Dunsworth

08/14/2024, 6:46 PM

Copy code

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[173], line 1
----> 1 alxm_fcst4 = nixtla_client.forecast(
      2     tr_alxmsf,
      3     id_col = 'unique_id',
      4     time_col = 'ds',
      5     target_col='y',
      6     h = 12,
      7     freq = 'MS',
      8     level = [90,95],
      9     X_df = tst_alxmsf2,
     10 )
     12 alxm_fcst4_pred = alxm_fcst4['TimeGPT']
     14 alxm_fcst4.tail()

File ~\anaconda3\envs\nixtla2\Lib\site-packages\nixtla\nixtla_client.py:60, in deprecated_argument.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     58         raise TypeError(f"{new_name} argument duplicated")
     59     kwargs[new_name] = kwargs.pop(old_name)
---> 60 return func(*args, **kwargs)

File ~\anaconda3\envs\nixtla2\Lib\site-packages\nixtla\nixtla_client.py:60, in deprecated_argument.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     58         raise TypeError(f"{new_name} argument duplicated")
     59     kwargs[new_name] = kwargs.pop(old_name)
---> 60 return func(*args, **kwargs)
...
    483         f"You have to pass the {self.h} future values of your "
    484         "exogenous variables for each time series"
    485     )

Exception: You must include the exogenous variables in the `df` object, exogenous variables US_New Year's Day,US_New Year's Day (observed),US_Memorial Day,US_Juneteenth National Independence Day,US_Independence Day,US_Labor Day,US_Veterans Day,US_Veterans Day (observed),US_Thanksgiving,US_Christmas Day,US_Martin Luther King Jr. Day,US_Washington's Birthday,US_Columbus Day

Marco

08/14/2024, 7:14 PM

Ok, I think I see the problem. Because of the dates, some holidays are present in one of the dataframes, but they must be present in both. Instead of adding holidays to each dataframe separately, you should add them to whole dataframe and then make your split. That way, we ensure that all features are present everywhere.

🙏 1

Tony Dunsworth

08/14/2024, 7:15 PM

Ok. I'll try that.

Tony Dunsworth

08/14/2024, 7:31 PM

Thank you, that worked like I expected. I appreciate the assistance!!

✅ 1

6 Views

Open in Slack

Previous Next