Hi guys, first of all thanks for your work on this...
# neural-forecast
f
Hi guys, first of all thanks for your work on this package, looks very promising! I am working with nbeats-like models since the paper came out in 2020 and recently wanted to check out the nhits architecture after its release this year. This was when I came across your package in the first place. While I like the idea of the package, I am struggling a bit with understanding the way you created your Dataset-Classes. From what I understood so far, the two main classes are
TimeSeriesDataset
and
WindowsDataset
. While the first one stores an entire time-series the later one stores sliding windows created from time-series, correct? The second approach feels natural when working with nbeats-like models (as well as other neural forecasting models). However, the implementation you opted confuses me a bit: Previously, when creating my own generators, I have chosen the following recipe for feeding data during training to the model: 1. Sample
n
time-series from the dataset 2. For each series, sample a split time point 3. Create one window for each series form the sampled splits (resulting in
n
windows) and feed them to model as a mini-batch (n = batch_size) This is also the approach described in Oreshkina et al. 2020 (where it is combined with stratified sampling). The implementation you chosen for the
WindowsDataset
(at least as far as I understand it) is different in that all possible windows are generated for a series and retuned (resulting in
n != batch_size
). I am wondering: 1. Why did you choose this implementation style, what are the advantages? 2. Is it possible to implement the Oreshkina et al. 2020 style sampling (especially the stratified version) using your package? Best, Fabian Oreshkina et al. 2020: https://arxiv.org/pdf/2009.11961.pdf
c
in addition to what Kin mentioned, you can have a similar style sampling by specifying the
n_windows
equal to the batch size you want. The WindowsDataset will first sample
batch_size
series, and then select
n_windows
windows from all the constructed windows of the
batch_size
series.
k
Hi @Fabian Müller, thanks for the question. The WindowsDataset is much faster than the previous N-BEATS sampling method. Because we use the PyTorch unfold method with the GPU, creating all the windows and then >>*sampling<<* them, with the `_create_windows_tensor` method. We are thinking to switch the unfold method's place to be within the N-BEATS/N-HiTS PyTorch nn.module, as that would help to avoid incompatibilities with PyTorch and PyTorch Lightning that prefer to operate the Datasets only with cpus. (edited)
c
another advantage of our approach is that it disentangles the number of windows for training and number of series. You might have a dataset with only one time series but want a larger number of windows. If you want same number of series sampled (batch_size) than the final number of windows just set
n_windows
=
batch_size.
.
f
Hi, thanks for the quick and helpful response. I see the performance advantage you mentioned. Especially when you have multiple windows per series. For one window per series I am not sure about it though. Thanks for the
n_windows
=
batch_size
tipp. Will check it out. I also came across the
eq_batch_size
argument in the FastTimeSeriesLoader that I am currently using. But just to be sure, while both methods will result in the number of windows being equal to the batch_size, it is not guaranteed that it will be exactly one window per series. Correct? And what do you think about the stratified sampling as mentioned in the paper? From what I understand, it might be especially relevant for your approach since you sampling from all windows and long series will produce more windows and therefore will be overrepresented in the training?
k
@Fabian Müller We tried in the past "stratified/hierarchical sampling", two ideas around it: • It could be possible to replicate its effects with "weighted sampling" during train, with the current WindowsDataset. • Moving the PyTorch unfold method within the N-BEATS/N-HiTS model would need to use stratified sampling. I would like to try it again.
👍 2