original data is monthly volumes data (counts) ,wi...
# general
j
original data is monthly volumes data (counts) ,with numeric unique_id ds y 2021-10-01 49941 2021-11-01 49731 2021-12-01 61980
f
Hey @J T! Regarding the dates generated, you could use
freq='MS'
since your original
ds
encodes monthly frequency at the starting date. Regarding the forecast, would it be possible for you to share your data? Even a time series (
unique_id==0
) would be helpful for us to see if there’s a bug somewhere.
j
cool... thank you very much Fede for your quick respond. will try it out tonight and share the results back with you 🙂
and wonder if i can get a documentation about those parameters... tried to find it yesterday and could not find it.
f
We have the official docs of the StatsForecast class here: https://nixtla.github.io/statsforecast/core.html#statsforecast. In the
freq
parameter, there is a link to the available frequencies (from pandas, https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).
We are trying to improve our documentation, and we’d love to receive your feedback on how we can improve it 🙂
👍 1
j
thanks fede... i updated my codes based on the document and the latest installation packages. the results are within 0.3% from pmdarima. thanks
after it runs, i try to save it as csv but it errors out. the directory does exists as that's where the import csv is from. thought? help? code: Y_hat_df.to_csv('dbfs:/FileStore/tables/bronze/Core/NIXTLA.csv') error: OSError: Cannot save file into a non-existent directory: 'dbfs:/FileStore/tables/bronze/Core'
will provide more feedback later on when it's completed. thank you very much
f
hey @J T! glad it worked out
Perhaps the error is related to the path, here mentions that the path should have the format
/dbfs/FileStore/...
. Thus maybe changing
'dbfs:/FileStore/tables/bronze/Core/NIXTLA.csv'
to
'/dbfs/FileStore/tables/bronze/Core/NIXTLA.csv'
could work
j
thanks! got the solution: converting it to spark df and then save it to dbfs. wrote down notes on specific solution to share later on
questions... i ran a few examples and it was successful... but when i ran another batch, it errors out: ZeroDivisionError: division by zero I assume for certain unique_id, the Y is so close to 0 or something else causing it to fail mathematically. How i can track down which ID was running, and how to use soemthing like "Try... except... else" to move on to next ID/forecast and document which one fails?
f
Hey @J T! For the moment, you cant track the specific unique id that is failing. A solution that usually works is to is add a constant to the y column to prevent zero values. After that, just remove the constant of the forecast.
Copy code
Y_df['y'] += constant
Copy code
Y_hat_df['AutoARIMA'] -= constant
j
thanks...Fede... forgot to mentioned that the error happens during the forecast step: Y_hat_df = model.forecast(horizon) ZeroDivisionError: division by zero since i am running say, 500 unique IDs... how i would know which id errors out and how to skip it and move on to the next id, and understand why in the end?
by the way, happy Friday and thanks!!!
ok... i am changing it to this and running it currently. i feel adding +=1 is small enough to not impacting anything. then i will remove it afterwards.
#adding to remove error constant = 1 products2['y'] += constant #Select SARIMA with seasonality 12 autoARIMA = AutoARIMA(season_length=season_length) # Select ETS with seasonality 12 and multiplicative trend #autoETS = ETS(season_length=12, model='ZMZ') model = StatsForecast(df=products2.set_index('unique_id'), models=[autoARIMA], ##, autoETS freq='MS', n_jobs=-1) #products2 Y_hat_df = model.forecast(horizon).reset_index()
Copy code
Y_hat_df['AutoARIMA'] -= constant
update, still not working with the same error msg: divided by 0 here is the updated code:
Copy code
#adding to remove error
constant = 1
products2['y'] = products2['y0'] + constant

init1 = time.time()

#Select SARIMA with seasonality 12
autoARIMA = AutoARIMA(season_length=season_length)

# Select ETS with seasonality 12 and multiplicative trend

model = StatsForecast(df=products2.set_index('unique_id'), 
                      models=[autoARIMA],
                      freq='MS', n_jobs=-1)
init2 = time.time()
Y_hat_df = model.forecast(horizon).reset_index()
end = time.time()

time_model = init2 - init1
time_fcst = end - init2
time_tot = end - init1

#taking constant out
Y_hat_df['AutoARIMA0'] = Y_hat_df['AutoARIMA'] - constant

in the end, Y_hat_df = model.forecast(horizon).reset_index() was highlighted and the error alert was 'divided by 0' error
Follow up: after i removed all the Timers (time.time) and the constant +/-.., it ran thru with no error for the sets of unique ids... very strange 😞
will rerun it later on and see if things happen again.
update: ti divided by 0 does not show up any more. i wonder if it's related to Pandas' version. for some reason, AZ Databricks shows pd on 1.2.4 version. So, i force update it to 1.5.0. with that, all errors are gone.
m
@J T Can you provide an example of working code? I’m getting the same error and haven’t been able to solve for it.
j
@mike here is what i can repeat and proves working on my side for now: 1. %pip install --upgrade pandas ### to update az databrick pandas to 1.5.0 as it was 1.2.4 currently 2. detach and re-atatch the notebook in DB 3. import numpy as np ; import pandas as pd the rest would be similar to the documentation from Nixtla's hope it works for you. i have tried many ways and this is the one keeps working for me so far.
👀 1
new update: run it without updating pandas to 1.5.0, it works. i prob need to run it with diff dataset to see how it plays out
f
hey @J T! I was out-of-office last week. Is everything running correctly? Let me know if you need any help
m
Hey @fede! Just wanted to start off by saying thank you for all your work with the package — huge fan! Yes, I am still having the issue. I did my best to document every aspect of the run. Attached is a sample of 100K time series in which somewhere I get a divide by zero error. File, code, screen shot of error, and pip list are all attached. Really hoping it’s something simple that I am missing here.
@fede (nixtla) (they/them) @Max (Nixtla) - any thoughts? Should I attempt to use the fallback_model as a work around?
m
Hi @mike, sorry for not answering. I just remembered about this thread. I’m going to take a look at your data tomorrow and provide a more complete answer. Meanwhile, you can try setting the fallback_model to naive as a workaround.
m
No worries, appreciate you taking a look! I’ll also take a look at the fallback_model tomorrow as well.
@Max (Nixtla) - the
fallback_model
worked for this example. Will be using that moving forward. Is there a way to identify which
unique_id
used the fallback model?
m
Not yet! But that is a great feature. Could you open a Github Issue?
By the way, this is the correct account for @fede (nixtla) (they/them). Let me tag him here.
m
Perfect, I have created an issue: https://github.com/Nixtla/statsforecast/issues/290
👍 1
m
Thanks 🙂