Nixtla's Open Source Time Series Ecosystem.

Nixtla Community

Hi team! One of our users, Peter Schofield (<mailto:pcschof@gmail.com|pcschof@gmail.com>), is using TimeGPT to forecast stock data and has some questions about the preprocessing steps. I wonder whether we are using MinMax scaling in the preprocessing step and if the `clean_ex_first` option would be helpful in this situation? Thank you!
```Just occurred to me that, for this type of problem, it is important to not use min-max scaling in the pre-processing, because it would likely result in data leakage (depending on how it's done). Can you elaborate on the pre-processing scaling techniques, and how the user can control that?
Suppose we scaled the exogenous variables ourselves, how would the model know that scaling has already been done, and that it shouldn't scale the already scaled inputs?```

My understanding is there there is no minmax scaling of the time series data. In fact, I don't see any scaling of the time series data. Only exogenous features go through the standard scaler.

If he scales the data, then TimeGPT makes predictions in that scale, so he would have to inverse transform them manually.

Okay! So basically, the preprocessing steps doesn’t scale target values. So the problem for data leakage doesn’t exist. And the exogenous variable is scaled using standard scaler.

Exactly! In fact, data leakage should never be a problem for a foundation model, since the idea is always to take an input and forecast next steps.

another question, is there any way to skip the preprocessing steps for the exogenous variables? So if users scaled the exogenous variable beforehand we can avoid double scaling?