Tony Dunsworth
08/14/2024, 4:06 PMdf
object, exogenous variables y,US_New Year's Day,US_New Year's Day (observed),US_Memorial Day,US_Juneteenth National Independence Day,US_Independence Day,US_Labor Day,US_Veterans Day,US_Veterans Day (observed),US_Thanksgiving,US_Christmas Day,US_Martin Luther King Jr. Day,US_Washington's Birthday,US_Columbus Day"
from nixtla.date_features import CountryHolidays
us_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tst_alxmsf.iloc[0]['ds'], end=tst_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()
monthly_holidays = monthly_holidays.reset_index(names='ds')
monthly_holidays['ds'] = pd.to_datetime(monthly_holidays['ds'])
tst_alxmsf['ds'] = pd.to_datetime(tst_alxmsf['ds'])
tst_alxmsf = tst_alxmsf.merge(monthly_holidays)
tst_alxmsf.head()
dates = pd.date_range(start=tr_alxmsf.iloc[0]['ds'], end=tr_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()
monthly_holidays = monthly_holidays.reset_index(names='ds')
monthly_holidays['ds'] = pd.to_datetime(monthly_holidays['ds'])
tr_alxmsf['ds'] = pd.to_datetime(tr_alxmsf['ds'])
tr_alxmsf = tr_alxmsf.merge(monthly_holidays)
alxm_fcst4 = nixtla_client.forecast(
df = tr_alxmsf,
id_col = 'unique_id',
time_col = 'ds',
target_col='y',
h = 12,
freq = 'MS',
level = [90,95],
X_df = tst_alxmsf,
)
alxm_fcst4_pred = alxm_fcst4['TimeGPT']
The first two code blocks run with no problems. When I checked the datasets, they both had the merged columns where they should, though I was surprised that in the tail of tr_alxmsf, the month of December did not have Christmas marked as 1 where I expected it. However, that is not as important at this point. When I ran the forecast code, I received the error; I identified the df (I'm assuming, and this could be my problem) that this is where the evaluation set goes since X_df is the future dataset (tst_alxmsf). As a note, tr indicates that it is a training set, tst is the test set, and sf simply means I adjusted the dataset for statsforecast that I used earlier in the notebook. If anyone could, after reading all my posts, point me in the right direction to fix what I have done wrong, I'd appreciate it. Thanks in advance for any help offered.Marco
08/14/2024, 4:10 PMTony Dunsworth
08/14/2024, 4:33 PMTony Dunsworth
08/14/2024, 4:44 PMus_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tr_alxmsf.iloc[0]['ds'], end=tr_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()
Tony Dunsworth
08/14/2024, 4:48 PMus_holidays = CountryHolidays(countries=['US'])
dates = pd.date_range(start=tst_alxmsf.iloc[0]['ds'], end=tst_alxmsf.iloc[-1]['ds'], freq='D')
holidays_df = us_holidays(dates)
monthly_holidays = holidays_df.resample('MS').max()
tst_alxmsf2 = tst_alxmsf.drop(['y'], axis=1)
The lists for the two sets are below:
Training:
['unique_id',
'ds',
'y',
"US_New Year's Day",
'US_Memorial Day',
'US_Independence Day',
'US_Labor Day',
'US_Veterans Day',
'US_Thanksgiving',
'US_Christmas Day',
'US_Martin Luther King Jr. Day',
"US_Washington's Birthday",
'US_Columbus Day',
'US_Independence Day (observed)',
"US_New Year's Day (observed)",
'US_Juneteenth National Independence Day',
'US_Juneteenth National Independence Day (observed)',
'US_Christmas Day (observed)']
Future set:
['unique_id',
'ds',
"US_New Year's Day",
"US_New Year's Day (observed)",
'US_Memorial Day',
'US_Juneteenth National Independence Day',
'US_Independence Day',
'US_Labor Day',
'US_Veterans Day',
'US_Veterans Day (observed)',
'US_Thanksgiving',
'US_Christmas Day',
'US_Martin Luther King Jr. Day',
"US_Washington's Birthday",
'US_Columbus Day']
So it looks like there are inconsistencies that I might have to add by hand?Marco
08/14/2024, 5:45 PMTony Dunsworth
08/14/2024, 5:46 PMTony Dunsworth
08/14/2024, 5:52 PMMarco
08/14/2024, 5:55 PMTony Dunsworth
08/14/2024, 6:04 PMMarco
08/14/2024, 6:43 PMTony Dunsworth
08/14/2024, 6:46 PM---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[173], line 1
----> 1 alxm_fcst4 = nixtla_client.forecast(
2 tr_alxmsf,
3 id_col = 'unique_id',
4 time_col = 'ds',
5 target_col='y',
6 h = 12,
7 freq = 'MS',
8 level = [90,95],
9 X_df = tst_alxmsf2,
10 )
12 alxm_fcst4_pred = alxm_fcst4['TimeGPT']
14 alxm_fcst4.tail()
File ~\anaconda3\envs\nixtla2\Lib\site-packages\nixtla\nixtla_client.py:60, in deprecated_argument.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
58 raise TypeError(f"{new_name} argument duplicated")
59 kwargs[new_name] = kwargs.pop(old_name)
---> 60 return func(*args, **kwargs)
File ~\anaconda3\envs\nixtla2\Lib\site-packages\nixtla\nixtla_client.py:60, in deprecated_argument.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
58 raise TypeError(f"{new_name} argument duplicated")
59 kwargs[new_name] = kwargs.pop(old_name)
---> 60 return func(*args, **kwargs)
...
483 f"You have to pass the {self.h} future values of your "
484 "exogenous variables for each time series"
485 )
Exception: You must include the exogenous variables in the `df` object, exogenous variables US_New Year's Day,US_New Year's Day (observed),US_Memorial Day,US_Juneteenth National Independence Day,US_Independence Day,US_Labor Day,US_Veterans Day,US_Veterans Day (observed),US_Thanksgiving,US_Christmas Day,US_Martin Luther King Jr. Day,US_Washington's Birthday,US_Columbus Day
Marco
08/14/2024, 7:14 PMTony Dunsworth
08/14/2024, 7:15 PMTony Dunsworth
08/14/2024, 7:31 PM