#neural-forecast

Title

# neural-forecast

a

Antoine SCHWARTZ -CROIX-

07/27/2023, 9:07 AMHello guys,
Since the last release, I frequently have crashes at the beginning of my trainings:

`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`

I feel like it's random, however in my last 2 HP tuning, it happened with the `scaler_type`

set to `identity`

(random or root cause, i don't know).
Am I the only one ?
Thanksit just happened again, this time at the

`predict`

step, still with the `identity`

scaler (on DeepAR):
Copy code

```
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/neuralforecast/models/deepar.py:487, in DeepAR.forward(self, windows_batch)
484 output = self.loss.domain_map(output)
486 # Inverse normalization
--> 487 distr_args = self.loss.scale_decouple(
488 output=output, loc=y_loc, scale=y_scale
489 )
490 # Add horizon (1) dimension
491 distr_args = list(distr_args)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/neuralforecast/losses/pytorch.py:771, in nbinomial_scale_decouple(output, loc, scale)
769 alpha = F.softplus(alpha) + 1e-8 # alpha = 1/total_counts
770 if (loc is not None) and (scale is not None):
--> 771 mu *= loc
772 alpha /= loc + 1.0
774 # mu = total_count * (probs/(1-probs))
775 # => probs = mu / (total_count + mu)
776 # => probs = mu / [total_count * (1 + mu * (1/total_count))]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
```

Ok I think that’s the same error you fixed yesterday **@Cristian (Nixtla)** for the MQLoss

c

Cristian (Nixtla)

07/27/2023, 2:55 PMHi **@Antoine SCHWARTZ -CROIX-**! Yes, it should be fixed now

a

Antoine SCHWARTZ -CROIX-

07/27/2023, 3:06 PMI misspoke, it's the same type of error, but your fix from yesterday doesn't apply to

`DistributionLoss`

. The problem occurs in the `*_scale_decouple`

functions of some distributions when no scaler is applied to the input data.
I've managed to fix the negative binomial on the fly, but I doubt it's optimal:
Copy code

```
def nbinomial_scale_decouple(output, loc=None, scale=None):
"""Negative Binomial Scale Decouple
Stabilizes model's output optimization, by learning total
count and logits based on anchoring `loc`, `scale`.
Also adds Negative Binomial domain protection to the distribution parameters.
"""
mu, alpha = output
mu = F.softplus(mu) + 1e-8
alpha = F.softplus(alpha) + 1e-8 # alpha = 1/total_counts
if (loc is not None) and (scale is not None):
mu *= <http://loc.to|loc.to>(mu.device)
alpha /= <http://loc.to|loc.to>(alpha.device) + 1.0
# mu = total_count * (probs/(1-probs))
# => probs = mu / (total_count + mu)
# => probs = mu / [total_count * (1 + mu * (1/total_count))]
total_count = 1.0 / alpha
probs = (mu * alpha / (1.0 + mu * alpha)) + 1e-8
return (total_count, probs)
```

c

Cristian (Nixtla)

07/27/2023, 3:11 PMa

Antoine SCHWARTZ -CROIX-

07/28/2023, 8:32 AMYou're right, the results aren't very good without scaling for DistributionLoss, but I had left "identity" in the parameter space to be explored for tuning (as for NHITS) and that's when I came across the error, and I didn't understand. By the way, I think this is the default value too.
However, NegativeBinomial, which is often recommended for positive count data, doesn't work well when the input data is centered. On the Pytorch-Forecasting side, they block the option outright: https://pytorch-forecasting.readthedocs.io/en/stable/_modules/pytorch_forecasting/metrics/distributions.html#NegativeBinomialDistributionLoss
Unless I'm mistaken, I think it's impossible to control this on Nixtla? So the only option left is to use the traditional minmax_scaler (not minmax1) to hope for satisfying results with negative binomial?

k

Kin Gtz. Olivares

07/28/2023, 1:26 PMHey **@Antoine SCHWARTZ -CROIX-**,
We need to figure out the interaction of the scale_decouple technique and the DistributionLoss
For the moment we have the Poisson Mixture working correctly on positive count data.
If you would be kind, can you add an issue on this unresolved scale and distribution interaction?
https://github.com/Nixtla/neuralforecast/issues

a

Antoine SCHWARTZ -CROIX-

07/28/2023, 2:06 PMThanks **@Kin Gtz. Olivares**, yes I'll do it as soon as I can!
Otherwise, for now, it seems that the negative binomial proposes bad results on my data, no matter which scaler I choose. I suspect a bad interaction somewhere in the code, as this is the one that offers the best performance on the other deepAR implementations I've been able to test (Sagemaker, gluonTS torch & mxnet version, pytorch-forecasting).

That's it, I've opened 2 issues that summarize the discussions above. Don't hesitate to contact me if you'd like more details!

c

Cristian (Nixtla)

08/01/2023, 5:18 PMthanks!

3 Views