Tracy Teal
07/10/2024, 7:29 PMTracy Teal
07/10/2024, 7:29 PMThanks again for providing free credits for our academic project! Our results look interesting, showing that, in some cases, the GPT outperforms the competing state-of-the-art algorithms, especially when the time series is described by many cases, providing some level of robustness. When the numbers underlying the time series are small, the forecasts are not very reliable, neither from TimeGPT nor from the other algorithms.
We use a very small dataset for this, providing only 52 time points per year, so I think it fits very well to do transfer learning here.
We would like to double-check one question with you. As the ILI dataset we use for this study is publicly available, we wanted to make sure that this data has not been used to pre-train the TimeGPT model.
We couldn’t find this information on the webpage, so I am writing to you. It is about the dataset from FLUVIEW from the CDC from the US: National, Regional, and State Level Outpatient Illness and Viral Surveillance (cdc.gov)
There is also a publication using this dataset: https://www.science.org/doi/10.1126/sciadv.abb1237
Can you please tell me if this particular dataset has been used to pre-training TimeGPT?
Is there an overview about this and we just didn’t find it?
This would help a lot. If it were included, our results would have to be interpreted differently.
Tracy Teal
07/10/2024, 7:30 PMat the School of Computer Science at the University of Auckland.
My main project is around forecasting hospitalization rates related to respiratory diseases and influenza-like illness cases in general.
Using a foundation model for this purpose is quite interesting.
Furthermore, I supervise a student project in which two students compare the TimeGPT forecasting with the forecasting of other comprehensive frameworks such as AutoGluon-TS.
azul (she/her) (nixtla)
07/10/2024, 8:08 PMMax
07/10/2024, 8:56 PMTracy Teal
07/11/2024, 9:44 PMthank you so much, this is a super important information!
We could use another dataset very similar to the US-ILI surveillance. This data is also about respiratory diseases derived from a hospital surveillance in Auckland. It has not been published, so we can be sure that it is not part of the TimeGPT training data.
However, there are concerns about using this data with large models that also require internet access. Collaborators are afraid that when we use TimeGPT for this data that it gets uploaded to a server overseas which would be not allowed according to the data ethic approval.
When we use the github for TimeGPT: https://github.com/Nixtla/nixtla
Yes, we need internet access to verify the Token, right?
But are there procedures uploading the data to whatever server? If the data just stays on the local machine, then we should be able to use it to test TimeGPT on this data, I think.
Tracy Teal
07/11/2024, 9:47 PMMax
07/12/2024, 1:16 AM