Good morning, I'm probably missing one small step,...
# timegpt
t
Good morning, I'm probably missing one small step, but I'm working on adding US holidays to a TimeGPT forecast (Actually, by the time I'm done, it will be 12) for my dissertation. I'll include all my setup code below. It all functions properly in the notebook, but when I get to the forecast, I get this error: "You must include the exogenous variables in the
df
object, exogenous variables y,US_New Year's Day,US_New Year's Day (observed),US_Memorial Day,US_Juneteenth National Independence Day,US_Independence Day,US_Labor Day,US_Veterans Day,US_Veterans Day (observed),US_Thanksgiving,US_Christmas Day,US_Martin Luther King Jr. Day,US_Washington's Birthday,US_Columbus Day"
Copy code
from nixtla.date_features import CountryHolidays

us_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tst_alxmsf.iloc[0]['ds'], end=tst_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()

monthly_holidays = monthly_holidays.reset_index(names='ds')
monthly_holidays['ds'] = pd.to_datetime(monthly_holidays['ds']) 
tst_alxmsf['ds'] = pd.to_datetime(tst_alxmsf['ds'])

tst_alxmsf = tst_alxmsf.merge(monthly_holidays)

tst_alxmsf.head()
Copy code
dates = pd.date_range(start=tr_alxmsf.iloc[0]['ds'], end=tr_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()

monthly_holidays = monthly_holidays.reset_index(names='ds')
monthly_holidays['ds'] = pd.to_datetime(monthly_holidays['ds'])
tr_alxmsf['ds'] = pd.to_datetime(tr_alxmsf['ds'])

tr_alxmsf = tr_alxmsf.merge(monthly_holidays)
Copy code
alxm_fcst4 = nixtla_client.forecast(
    df = tr_alxmsf,
    id_col = 'unique_id',
    time_col = 'ds',
    target_col='y',
    h = 12,
    freq = 'MS',
    level = [90,95],
    X_df = tst_alxmsf,
)

alxm_fcst4_pred = alxm_fcst4['TimeGPT']
The first two code blocks run with no problems. When I checked the datasets, they both had the merged columns where they should, though I was surprised that in the tail of tr_alxmsf, the month of December did not have Christmas marked as 1 where I expected it. However, that is not as important at this point. When I ran the forecast code, I received the error; I identified the df (I'm assuming, and this could be my problem) that this is where the evaluation set goes since X_df is the future dataset (tst_alxmsf). As a note, tr indicates that it is a training set, tst is the test set, and sf simply means I adjusted the dataset for statsforecast that I used earlier in the notebook. If anyone could, after reading all my posts, point me in the right direction to fix what I have done wrong, I'd appreciate it. Thanks in advance for any help offered.
m
To use exogenous variables, both your input dataframe and future dataframe must have them. You can refer to this tutorial, which adds US holidays to an existing dataset, we create a future dataframe, and use TimeGPT to forecast our target and also mesure the importance of each holiday.
t
Thank you. You just told me what my problem is because my testing set, which becomes the future dataset, already has values in it; if I'm right, the 'y' variable there is what the function doesn't like. So I will drop the values from a copy of the set and then I can use it with no problems.
Well, that didn't work as well as I would like. The error changed. It removed the y, but it failed still stating that I need to add the exogenous variables. I did notice that when I made a list of the holidays in the future dataset and the training dataset, they are different although they are supposed to be using the same
Copy code
us_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tr_alxmsf.iloc[0]['ds'], end=tr_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()
The first part is the training set code block. The following is the future dataset code block:
Copy code
us_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tst_alxmsf.iloc[0]['ds'], end=tst_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()
tst_alxmsf2 = tst_alxmsf.drop(['y'], axis=1)
The lists for the two sets are below:
Copy code
Training: 
['unique_id',
 'ds',
 'y',
 "US_New Year's Day",
 'US_Memorial Day',
 'US_Independence Day',
 'US_Labor Day',
 'US_Veterans Day',
 'US_Thanksgiving',
 'US_Christmas Day',
 'US_Martin Luther King Jr. Day',
 "US_Washington's Birthday",
 'US_Columbus Day',
 'US_Independence Day (observed)',
 "US_New Year's Day (observed)",
 'US_Juneteenth National Independence Day',
 'US_Juneteenth National Independence Day (observed)',
 'US_Christmas Day (observed)']
Future set:
['unique_id',
 'ds',
 "US_New Year's Day",
 "US_New Year's Day (observed)",
 'US_Memorial Day',
 'US_Juneteenth National Independence Day',
 'US_Independence Day',
 'US_Labor Day',
 'US_Veterans Day',
 'US_Veterans Day (observed)',
 'US_Thanksgiving',
 'US_Christmas Day',
 'US_Martin Luther King Jr. Day',
 "US_Washington's Birthday",
 'US_Columbus Day']
So it looks like there are inconsistencies that I might have to add by hand?
m
Are there other exogenous variables, apart from holidays?
t
No sir. The holidays are my only exogenous variable for this exercise.
Here's an off-the-wall question: Could the list order be the problem?
m
Are you using nixtla version 0.5.2?
t
Yes. I'm using version 0.5.2 with Python 3.11.9 and I'm working in Posit's Position IDE 2024.08.0 build 24
m
Can you copy paste the error please? As it is returned
t
Copy code
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[173], line 1
----> 1 alxm_fcst4 = nixtla_client.forecast(
      2     tr_alxmsf,
      3     id_col = 'unique_id',
      4     time_col = 'ds',
      5     target_col='y',
      6     h = 12,
      7     freq = 'MS',
      8     level = [90,95],
      9     X_df = tst_alxmsf2,
     10 )
     12 alxm_fcst4_pred = alxm_fcst4['TimeGPT']
     14 alxm_fcst4.tail()

File ~\anaconda3\envs\nixtla2\Lib\site-packages\nixtla\nixtla_client.py:60, in deprecated_argument.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     58         raise TypeError(f"{new_name} argument duplicated")
     59     kwargs[new_name] = kwargs.pop(old_name)
---> 60 return func(*args, **kwargs)

File ~\anaconda3\envs\nixtla2\Lib\site-packages\nixtla\nixtla_client.py:60, in deprecated_argument.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     58         raise TypeError(f"{new_name} argument duplicated")
     59     kwargs[new_name] = kwargs.pop(old_name)
---> 60 return func(*args, **kwargs)
...
    483         f"You have to pass the {self.h} future values of your "
    484         "exogenous variables for each time series"
    485     )

Exception: You must include the exogenous variables in the `df` object, exogenous variables US_New Year's Day,US_New Year's Day (observed),US_Memorial Day,US_Juneteenth National Independence Day,US_Independence Day,US_Labor Day,US_Veterans Day,US_Veterans Day (observed),US_Thanksgiving,US_Christmas Day,US_Martin Luther King Jr. Day,US_Washington's Birthday,US_Columbus Day
m
Ok, I think I see the problem. Because of the dates, some holidays are present in one of the dataframes, but they must be present in both. Instead of adding holidays to each dataframe separately, you should add them to whole dataframe and then make your split. That way, we ensure that all features are present everywhere.
🙏 1
t
Ok. I'll try that.
Thank you, that worked like I expected. I appreciate the assistance!!
1