https://github.com/nixtla logo
#general
Title
# general
m

Max (Nixtla)

07/20/2023, 8:19 PM
Sorry, wrong linkl 🙂
f

Farzad E

07/20/2023, 8:31 PM
Good for them. I always support healthy competitions. But I advise against relying on Polars. It is not a bad tool per se but to build entire libraries on top of a tool that only has one developer is very risky. Every year we see some tool making news claiming they are a quick replacement for Pandas and they work X times better than Pandas. Yet none of these tools have stood the test of time in the last decade. Wes Mckinney creator of Pandas is also creator of Arrow which many projects use under the hood these days. Pandas 2 has switched its backend from Numpy to Arrow and we will see great performance improvements coming to Pandas.
👍 1
m

Max (Nixtla)

07/20/2023, 8:34 PM
Healthy competitions are the best for the community! We agree with you, and that is why we prefer the 'anydataframe' paradigm where people can choose what to use: Pandas, Sparkl, Ray, Dask, Polars, etc
👍 3
❤️ 1
t

Tyler Blume

07/20/2023, 11:35 PM
Have y’all benchmarked their automl functionality on m4 for accuracy?
m

Max (Nixtla)

07/20/2023, 11:53 PM
No, @Tyler Blume. But it could be interesting. cc @José Morales
v

Valeriy

07/21/2023, 12:57 PM
I think M5 dataset is tired, see my proposal for Grand Time Series Olympic Games using proper datasets like Monash and The Complete Journey @Max (Nixtla) https://www.linkedin.com/posts/activity-7088140888643653632-wKj5?utm_source=share&utm_medium=member_desktop
👍 1
t

Tyler Blume

07/21/2023, 1:21 PM
@Valeriy I agree, I think also how it stands the time series world is more of a 'horses for your courses' field so we should probably emphasize the different major 'buckets' of datasets. Like I usually see DL approaches do super well if you have long nice time series without too much intermittency, Boosted trees win out more often when we have useful exogenous features with a little more of a mixture of time series features (short/long, intermittent, etc), and stat might be preferred when you have a mixture but no useful exogenous. If the major benchmarks are over indexing a type of dataset then you might have people trying a DL technique for small and intermittent data and getting insane results.
👍 1
@Max (Nixtla) went ahead and tested the functime auto-lightgbm procedure out on some of the m4 datasets (weekly and yearly) and threw it on github. Had some issues overall and some errors so it might not be the best test for it but either way functime is only optimizing the number of lags and the tree parameters so it will always be pretty rough until they optimize for things like scaling and differencing. Definitely not an 'AutoML' procedure for time series forecasting.
👍 2
m

Max (Nixtla)

07/25/2023, 9:22 PM
Thanks Tyler! This is an amazing job.
@José Morales will he happy
Let see if you get one of the famous @Valeriy memes!
😂 2
😆 1
v

Valeriy

07/26/2023, 9:40 AM
I will make a meme for this, I am sceptical on functime claims and tactics having seen some of their latest moves. Unfortunately instead of doing full study as I suggested they decided to go for drip feed marketing tactics using selective claims and presentation of where it works best.
💯 1
What do you guys think?
🫠 1
m

Max (Nixtla)

07/26/2023, 5:14 PM
You are a monster @Valeriy.
🤣
v

Valeriy

07/26/2023, 5:17 PM
There is a simple solution to functime accuracy issue replace all models with naive forecast 😉
It will be even faster as well
t

Tyler Blume

07/26/2023, 5:29 PM
yeah I am going to try and stay on @Valeriy s good side, maybe if I add conformal predictions to all of my code that will work :)
🤣 1
🔥 1
v

Valeriy

07/26/2023, 5:47 PM
Well your lib never made inflated claims and I appreciate your analysis of functime. Chris reached out a while back asking for advice on baking in conformal prediction (which I have provided) but what is the point to have CP in library with poor performance like functime - people would start to complain about it saying it gives too wide intervals but this would be artefact or the poor point forecasting model accuracy rather than CP itself.
With a bit of hyndsight no mentioning of accuracy should have been a red flag 🚩 but personally I tool for granted they benchmarked it against other libraries including Nixtla. How and why someone would release a library with sup par accuracy begs a huge question 🙋‍♂️ especially as everyone should be aware of the issues with Facebook prophet. Unbelievable story.
f

Farzad E

07/26/2023, 6:05 PM
@Valeriy "How and why someone would release a library with sup par accuracy" Yeah but did you see they used Polars? That by itself deserves 10 points! We don't need our forecasting tools to be accurate. We need them to use shiny new libraries that influencers are talking about! 🤣 Anything to get people click on a post!