Hello everyone! I'm trying to forecast multiple ti...
# statsforecast
m
Hello everyone! I'm trying to forecast multiple time series of a combination of product-store sales. I would appreciate your help and knowledge in the following: 1. I have a df with 4,5 million rows, Running an AutoARIMA takes too long. Can it be my pc specs? Do you have any guidance with this or more volume of data, on how long it should take? (There are like 15k independent time series) 2. Additionally, I've attempted to run the same script on an M1 Max, but I encountered a compilation issue. I did some research and came across a related GitHub issue. Have any of you experienced something similar? 3. With some datasets I'm getting ZeroDivisionError. Could it be my data isn't "clean" enough? 4. And last one, should I homologate all my time series to the same min/max data range of my dataframe? Meaning that should I fill with zeroes and gaps of dates for all unique_ids of my df (using de global min/max of my df)? Thank you in advance for your time. This would be super helpful for me.
b
As for point 2: We’ve tried running the StatsForecast models on both M1 and Intel based Macbook Pro’s and the result is always the same: the notebook kernel just kind of freezes and it’s left running for several hours (likely compiling?) but nothing ever finishes. When I try it on one of our remote linux servers, it runs quite smooth. Any ideas?
k
You may be trying too many combinations? Are you using default parameters? To avoid Division by Zero, you can add 0.001 to the timeseries Yes you should fill since statsforecast drops missing rows.
What are your specs of your machine?
@Bradley de Leeuw, i don’t know for sure, but maybe you can try turning off parallelism and seeing if it helps? There is a default parameter n_jobs = -1. Try setting it to 1 and see if it helps stability. If it does, I might have other ideas
Can you show me the traceback of the compilation issue?
m
In the Macbook I have 32 GB of RAM. I tried tunning the parameters without luck. What is the approach to work with too many combinations?
k
32 GB is a lot. That should be fine. You’re not using the default parameters right?
m
This is the last msg, in this case I'm running the example of the documentation, and it should run in seconds in the Macbook, but it never finishes. The other model I'm running it in my other PC with 12 GB RAM. (The one with more combinations)
k
Are you familiar with numba? I think if it’s like this, we might just wanna disable it
Based on this issue , there isn’t much we can do here. The best idea I have is to try an environment variable:
Copy code
NUMBA_DISABLE_JIT=1
Or before you import `statsforecast`:
Copy code
import os
os.environ['NUMBA_DISABLE_JIT'] = '1'
Not really sure if it will work, but worth a shot
m
I'll give it a shot, thanks Kevin!
It's difficult to know if it's running, but the message changed. Is there anyway to log the iterations or something like that?
k
maybe
verbose=True
? in the
StatsForecast
m
Let me see, I'm trying it right now, thanks