This message was deleted.
# neural-forecast
s
This message was deleted.
j
Hey. The futr_df holds the values of the future exogenous features. It should have the unique_id (series identifier), ds (timestamps) and the values of the future exogenous, in this case the columns defined in the futr_exog_list
Copy code
futr_exog_list =['s_2', 's_3', 's_4', 's_7', 's_8', 's_9', 's_11',
                 's_12', 's_13', 's_14', 's_15', 's_17', 's_20', 's_21']
The error you're running into is because since these should be in the future we check that the ds matches what is expected, e.g. if your last training time was 200, the first future ds should be 201, etc. This tutorial was made before we had that validation, so we have to fix it. In the meantime you can offset the start timestamps in the futr_df with the following:
Copy code
last_train_dates = Y_train_df.groupby('unique_id')['ds'].max().rename('last_train_ds').reset_index()
Y_test_df2 = Y_test_df.merge(last_train_dates, on=['unique_id'])
Y_test_df2['ds'] += Y_test_df2['last_train_ds']
And use
futr_df=Y_test_df2
f
@José Morales Thanks for your quick help! Two questions: • regarding the predictive maintenance example.. if you have data from one machine (several sensors) but captured on different days (incontinuous). How to choose the unique_id? • Can I use therefore a timedate as ds? Like 2023-08-17 082900
j
• It depends. I would suggest using one unique_id per censor, however, neuralforecast assumes that you have observations for all your timestamps, so if you have a lot of missing points you could try having a single unique_id which combines all sensors data. • Yes, timestamps are preferred. When using that make sure to set the frequency accordingly, e.g.
NeuralForecast(freq='min', ...)
if the data is captured every minute.
f
Currently I log my research dataset in a table/wide format. Timestamp; Sensor1; Sensor2; Sensor n+1; y According to your suggestion I need to change the dataset to a long format? like this: Timestamp; sensor; value: unique_id
j
Yes. You'd have to rename your columns to be unique_id, ds, y. We're working on adding arguments to support different names, but at the moment they're fixed
🙌 1
f
I will have a look and check if this works. Thanks again!
👍 1
When transforming into this format, how can I than select the right variables for the futr_exog_list? Do you have an example how to use my data? Timestamp; Sensor1; Sensor2; Sensor n+1; y 2023-07-25 065800; 19.82, 50.27, 0.1; 223
j
If you'll have the future values for all sensors you can specify all as future exog, e.g.
futr_exog_list = ['Sensor1', 'Sensor2', ...]
otherwise you can specify them as historic only through
hist_exog_list
f
I do not get how to set up then the unique_id then. If you check the nixtla predictive maintenance example they have assigned for every run to failure data a number.
j
The unique_id should identify each time serie. In the maintenance example each unique_id is a different aircraft engine
f
Yes but what if you have just one aircraft engine with RUL over time?
j
You can set a dummy unique_id like 0
f
I set it before to ‚m8‘ but there was the error from my first message. But I need to verify this again
f
@José Morales I am getting the same error but with my own data. I didn't have this error before but I forgot what version of Neuralforecast I was using before. Now I have 1.6.4. I am looking at your code above and trying to see how you fixed it. Are you adding the last date in the train set to the test set? So if the train set ends on 2023-01-01 then the test set starts from 2023-01-01 instead of starting from the day after (2023-01-02)? Or did I misunderstand your fix?
j
Can you install from github? We've added a function to get the base of the futr_df to make this easier
f
@José Morales what's the best way to install from Github? What should I do after I clone the repo to install it?
@José Morales like this?
pip install git+<https://github.com/Nixtla/neuralforecast.git>
j
Yes, that should work
f
@Farzad E how is your data look like?
@José Morales because if I do not have the data (run-to-failure data) in this format: • train and test data should have the same amount of unique numbers (e.g. train 1-10, test 1-10) + first ds of the test is the last ds from train + frequency • 1 unique_id for all run to failure series which makes no sense If the data is not in this structure I can not evaluate the model. Problem with the first case is that you need to split the data 50% for testing which is too much
j
Are you looking to train on some ids and predict on others?
f
I do have multiple run to failure series > 100 unique series. I want to train on 80% and fully backtest on 20%
j
you can provide another df to predict, that way it will predict only the series present in that df
f
but your first ds in the test_df must be the last ds + frequency right?
Then Im not sure how to do it instead of splitting it 50, 50 with the same unique Ids like in the predicitve maintenance example on the nixtla webpage.
E.g. I have in total 28 run to failure series. Each of them are unique and sampled on different dates. Also the length is different. How would you model the ds and unique_id column @José Morales?
f
@Felix Saretzky My issue was that I had discrepancies between the unique_ids of my training set and my futr_df. Once I synchronized those, the problem was resolved. Your use case is more complex so I am not sure about your example.
j
Usually you validate on the same series, if you're looking to validate on different ones you have to provide the new ids through the
df
argument of predict and a
futr_df
for it. The easiest way to see the expected structure is to install from github, run fit and the run the
make_future_dataframe
method providing the df with your new ids
f
So I should not split my data? Normally I thought I train on lets say 20 unique time series (run-to-failure) and test my model accuracy on the other unique series
f
@Felix Saretzky The split doesn't happen like that. It happens in the time domain. You train on 3 years for example and test on one year. For testing new unique_ids you should follow what @José Morales explained.