This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

10/06/2022, 9:20 PM

This message was deleted.

fede (nixtla) (they/them)

10/06/2022, 9:25 PM

Hey @J T! Regarding the dates generated, you could use

freq='MS'

since your original

ds

encodes monthly frequency at the starting date. Regarding the forecast, would it be possible for you to share your data? Even a time series (

unique_id==0

) would be helpful for us to see if there’s a bug somewhere.

J T

10/06/2022, 9:28 PM

cool... thank you very much Fede for your quick respond. will try it out tonight and share the results back with you 🙂

J T

10/06/2022, 9:29 PM

and wonder if i can get a documentation about those parameters... tried to find it yesterday and could not find it.

fede (nixtla) (they/them)

10/06/2022, 9:33 PM

We have the official docs of the StatsForecast class here: https://nixtla.github.io/statsforecast/core.html#statsforecast. In the

freq

parameter, there is a link to the available frequencies (from pandas, https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).

fede (nixtla) (they/them)

10/06/2022, 9:35 PM

We are trying to improve our documentation, and we’d love to receive your feedback on how we can improve it 🙂

👍 1

J T

10/07/2022, 1:49 AM

thanks fede... i updated my codes based on the document and the latest installation packages. the results are within 0.3% from pmdarima. thanks

J T

10/07/2022, 1:52 AM

after it runs, i try to save it as csv but it errors out. the directory does exists as that's where the import csv is from. thought? help? code: Y_hat_df.to_csv('dbfs:/FileStore/tables/bronze/Core/NIXTLA.csv') error: OSError: Cannot save file into a non-existent directory: 'dbfs:/FileStore/tables/bronze/Core'

J T

10/07/2022, 1:54 AM

will provide more feedback later on when it's completed. thank you very much

fede (nixtla) (they/them)

10/07/2022, 2:11 AM

hey @J T! glad it worked out

fede (nixtla) (they/them)

10/07/2022, 2:14 AM

Perhaps the error is related to the path, here mentions that the path should have the format

/dbfs/FileStore/...

. Thus maybe changing

'dbfs:/FileStore/tables/bronze/Core/NIXTLA.csv'

'/dbfs/FileStore/tables/bronze/Core/NIXTLA.csv'

could work

J T

10/07/2022, 6:56 PM

thanks! got the solution: converting it to spark df and then save it to dbfs. wrote down notes on specific solution to share later on

J T

10/07/2022, 6:59 PM

questions... i ran a few examples and it was successful... but when i ran another batch, it errors out: ZeroDivisionError: division by zero I assume for certain unique_id, the Y is so close to 0 or something else causing it to fail mathematically. How i can track down which ID was running, and how to use soemthing like "Try... except... else" to move on to next ID/forecast and document which one fails?

fede (nixtla) (they/them)

10/07/2022, 7:15 PM

Hey @J T! For the moment, you cant track the specific unique id that is failing. A solution that usually works is to is add a constant to the y column to prevent zero values. After that, just remove the constant of the forecast.

Copy code

Y_df['y'] += constant

Copy code

Y_hat_df['AutoARIMA'] -= constant

J T

10/07/2022, 9:36 PM

thanks...Fede... forgot to mentioned that the error happens during the forecast step: Y_hat_df = model.forecast(horizon) ZeroDivisionError: division by zero since i am running say, 500 unique IDs... how i would know which id errors out and how to skip it and move on to the next id, and understand why in the end?

J T

10/07/2022, 9:37 PM

by the way, happy Friday and thanks!!!

J T

10/07/2022, 9:46 PM

ok... i am changing it to this and running it currently. i feel adding +=1 is small enough to not impacting anything. then i will remove it afterwards.

J T

10/07/2022, 9:46 PM

#adding to remove error constant = 1 products2['y'] += constant #Select SARIMA with seasonality 12 autoARIMA = AutoARIMA(season_length=season_length) # Select ETS with seasonality 12 and multiplicative trend #autoETS = ETS(season_length=12, model='ZMZ') model = StatsForecast(df=products2.set_index('unique_id'), models=[autoARIMA], ##, autoETS freq='MS', n_jobs=-1) #products2 Y_hat_df = model.forecast(horizon).reset_index()

Copy code

Y_hat_df['AutoARIMA'] -= constant

J T

10/08/2022, 8:43 PM

update, still not working with the same error msg: divided by 0 here is the updated code:

Copy code

#adding to remove error
constant = 1
products2['y'] = products2['y0'] + constant

init1 = time.time()

#Select SARIMA with seasonality 12
autoARIMA = AutoARIMA(season_length=season_length)

# Select ETS with seasonality 12 and multiplicative trend

model = StatsForecast(df=products2.set_index('unique_id'), 
                      models=[autoARIMA],
                      freq='MS', n_jobs=-1)
init2 = time.time()
Y_hat_df = model.forecast(horizon).reset_index()
end = time.time()

time_model = init2 - init1
time_fcst = end - init2
time_tot = end - init1

#taking constant out
Y_hat_df['AutoARIMA0'] = Y_hat_df['AutoARIMA'] - constant

in the end, Y_hat_df = model.forecast(horizon).reset_index() was highlighted and the error alert was 'divided by 0' error

J T

10/08/2022, 9:05 PM

Follow up: after i removed all the Timers (time.time) and the constant +/-.., it ran thru with no error for the sets of unique ids... very strange 😞

J T

10/08/2022, 9:05 PM

will rerun it later on and see if things happen again.

J T

10/09/2022, 7:25 PM

update: ti divided by 0 does not show up any more. i wonder if it's related to Pandas' version. for some reason, AZ Databricks shows pd on 1.2.4 version. So, i force update it to 1.5.0. with that, all errors are gone.

mike

10/14/2022, 9:04 PM

@J T Can you provide an example of working code? I’m getting the same error and haven’t been able to solve for it.

J T

10/16/2022, 8:37 PM

@mike here is what i can repeat and proves working on my side for now: 1. %pip install --upgrade pandas ### to update az databrick pandas to 1.5.0 as it was 1.2.4 currently 2. detach and re-atatch the notebook in DB 3. import numpy as np ; import pandas as pd the rest would be similar to the documentation from Nixtla's hope it works for you. i have tried many ways and this is the one keeps working for me so far.

👀 1

J T

10/17/2022, 12:57 PM

new update: run it without updating pandas to 1.5.0, it works. i prob need to run it with diff dataset to see how it plays out

fede (nixtla) (they/them)

10/17/2022, 4:58 PM

hey @J T! I was out-of-office last week. Is everything running correctly? Let me know if you need any help

mike

10/18/2022, 5:57 PM

Hey @fede! Just wanted to start off by saying thank you for all your work with the package — huge fan! Yes, I am still having the issue. I did my best to document every aspect of the run. Attached is a sample of 100K time series in which somewhere I get a divide by zero error. File, code, screen shot of error, and pip list are all attached. Really hoping it’s something simple that I am missing here.

results.parquet.gz

example_divde_by_zero.py package_versions.txt

mike

10/31/2022, 1:57 AM

@fede (nixtla) (they/them) @Max (Nixtla) - any thoughts? Should I attempt to use the fallback_model as a work around?

Max (Nixtla)

10/31/2022, 2:00 AM

Hi @mike, sorry for not answering. I just remembered about this thread. I’m going to take a look at your data tomorrow and provide a more complete answer. Meanwhile, you can try setting the fallback_model to naive as a workaround.

mike

10/31/2022, 2:13 AM

No worries, appreciate you taking a look! I’ll also take a look at the fallback_model tomorrow as well.

mike

11/01/2022, 4:38 PM

@Max (Nixtla) - the

fallback_model

worked for this example. Will be using that moving forward. Is there a way to identify which

unique_id

used the fallback model?

Max (Nixtla)

11/01/2022, 4:43 PM

Not yet! But that is a great feature. Could you open a Github Issue?

Max (Nixtla)

11/01/2022, 4:43 PM

By the way, this is the correct account for @fede (nixtla) (they/them). Let me tag him here.

mike

11/01/2022, 4:57 PM

Perfect, I have created an issue: https://github.com/Nixtla/statsforecast/issues/290

👍 1

Max (Nixtla)

11/01/2022, 4:59 PM

Thanks 🙂

3 Views

Open in Slack

Previous Next