Hello guys, Since the last release, I frequently h...
# neural-forecast
a
Hello guys, Since the last release, I frequently have crashes at the beginning of my trainings:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
I feel like it's random, however in my last 2 HP tuning, it happened with the
scaler_type
set to
identity
(random or root cause, i don't know). Am I the only one ? Thanks
it just happened again, this time at the
predict
step, still with the
identity
scaler (on DeepAR):
Copy code
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/neuralforecast/models/deepar.py:487, in DeepAR.forward(self, windows_batch)
    484 output = self.loss.domain_map(output)
    486 # Inverse normalization
--> 487 distr_args = self.loss.scale_decouple(
    488     output=output, loc=y_loc, scale=y_scale
    489 )
    490 # Add horizon (1) dimension
    491 distr_args = list(distr_args)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/neuralforecast/losses/pytorch.py:771, in nbinomial_scale_decouple(output, loc, scale)
    769 alpha = F.softplus(alpha) + 1e-8  # alpha = 1/total_counts
    770 if (loc is not None) and (scale is not None):
--> 771     mu *= loc
    772     alpha /= loc + 1.0
    774 # mu = total_count * (probs/(1-probs))
    775 # => probs = mu / (total_count + mu)
    776 # => probs = mu / [total_count * (1 + mu * (1/total_count))]

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Ok I think that’s the same error you fixed yesterday @Cristian (Nixtla) for the MQLoss
c
Hi @Antoine SCHWARTZ -CROIX-! Yes, it should be fixed now
a
I misspoke, it's the same type of error, but your fix from yesterday doesn't apply to
DistributionLoss
. The problem occurs in the
*_scale_decouple
functions of some distributions when no scaler is applied to the input data. I've managed to fix the negative binomial on the fly, but I doubt it's optimal:
Copy code
def nbinomial_scale_decouple(output, loc=None, scale=None):
    """Negative Binomial Scale Decouple

    Stabilizes model's output optimization, by learning total
    count and logits based on anchoring `loc`, `scale`.
    Also adds Negative Binomial domain protection to the distribution parameters.
    """
    mu, alpha = output
    mu = F.softplus(mu) + 1e-8
    alpha = F.softplus(alpha) + 1e-8  # alpha = 1/total_counts
    if (loc is not None) and (scale is not None):
        mu *= <http://loc.to|loc.to>(mu.device)
        alpha /= <http://loc.to|loc.to>(alpha.device) + 1.0

    # mu = total_count * (probs/(1-probs))
    # => probs = mu / (total_count + mu)
    # => probs = mu / [total_count * (1 + mu * (1/total_count))]
    total_count = 1.0 / alpha
    probs = (mu * alpha / (1.0 + mu * alpha)) + 1e-8
    return (total_count, probs)
c
@Kin Gtz. Olivares
@Antoine SCHWARTZ -CROIX- scaling is (almost) crucial to have a good performance with distribution losses. We explain this in our latest paper: https://arxiv.org/abs/2305.07089. And it is also suggested in other papers like DeepAR. What is your experience? Have you compared with/without scaling?
a
You're right, the results aren't very good without scaling for DistributionLoss, but I had left "identity" in the parameter space to be explored for tuning (as for NHITS) and that's when I came across the error, and I didn't understand. By the way, I think this is the default value too. However, NegativeBinomial, which is often recommended for positive count data, doesn't work well when the input data is centered. On the Pytorch-Forecasting side, they block the option outright: https://pytorch-forecasting.readthedocs.io/en/stable/_modules/pytorch_forecasting/metrics/distributions.html#NegativeBinomialDistributionLoss Unless I'm mistaken, I think it's impossible to control this on Nixtla? So the only option left is to use the traditional minmax_scaler (not minmax1) to hope for satisfying results with negative binomial?
k
Hey @Antoine SCHWARTZ -CROIX-, We need to figure out the interaction of the scale_decouple technique and the DistributionLoss For the moment we have the Poisson Mixture working correctly on positive count data. If you would be kind, can you add an issue on this unresolved scale and distribution interaction? https://github.com/Nixtla/neuralforecast/issues
a
Thanks @Kin Gtz. Olivares, yes I'll do it as soon as I can! Otherwise, for now, it seems that the negative binomial proposes bad results on my data, no matter which scaler I choose. I suspect a bad interaction somewhere in the code, as this is the one that offers the best performance on the other deepAR implementations I've been able to test (Sagemaker, gluonTS torch & mxnet version, pytorch-forecasting).
That's it, I've opened 2 issues that summarize the discussions above. Don't hesitate to contact me if you'd like more details!
c
thanks!