This message was deleted Nixtla Community #neural-forecast

Join Slack

This message was deleted.

# neural-forecast

Slackbot

11/28/2023, 4:43 PM

This message was deleted.

José Morales

11/28/2023, 5:20 PM

Hey. The futr_df holds the values of the future exogenous features. It should have the unique_id (series identifier), ds (timestamps) and the values of the future exogenous, in this case the columns defined in the futr_exog_list

Copy code

futr_exog_list =['s_2', 's_3', 's_4', 's_7', 's_8', 's_9', 's_11',
                 's_12', 's_13', 's_14', 's_15', 's_17', 's_20', 's_21']

The error you're running into is because since these should be in the future we check that the ds matches what is expected, e.g. if your last training time was 200, the first future ds should be 201, etc. This tutorial was made before we had that validation, so we have to fix it. In the meantime you can offset the start timestamps in the futr_df with the following:

Copy code

last_train_dates = Y_train_df.groupby('unique_id')['ds'].max().rename('last_train_ds').reset_index()
Y_test_df2 = Y_test_df.merge(last_train_dates, on=['unique_id'])
Y_test_df2['ds'] += Y_test_df2['last_train_ds']

And use

futr_df=Y_test_df2

Felix Saretzky

11/28/2023, 5:52 PM

@José Morales Thanks for your quick help! Two questions: • regarding the predictive maintenance example.. if you have data from one machine (several sensors) but captured on different days (incontinuous). How to choose the unique_id? • Can I use therefore a timedate as ds? Like 2023-08-17 082900

José Morales

11/28/2023, 5:59 PM

• It depends. I would suggest using one unique_id per censor, however, neuralforecast assumes that you have observations for all your timestamps, so if you have a lot of missing points you could try having a single unique_id which combines all sensors data. • Yes, timestamps are preferred. When using that make sure to set the frequency accordingly, e.g.

NeuralForecast(freq='min', ...)

if the data is captured every minute.

Felix Saretzky

11/28/2023, 6:08 PM

Currently I log my research dataset in a table/wide format. Timestamp; Sensor1; Sensor2; Sensor n+1; y According to your suggestion I need to change the dataset to a long format? like this: Timestamp; sensor; value: unique_id

José Morales

11/28/2023, 6:10 PM

Yes. You'd have to rename your columns to be unique_id, ds, y. We're working on adding arguments to support different names, but at the moment they're fixed

🙌 1

Felix Saretzky

11/28/2023, 6:13 PM

I will have a look and check if this works. Thanks again!

👍 1

Felix Saretzky

11/28/2023, 6:46 PM

When transforming into this format, how can I than select the right variables for the futr_exog_list? Do you have an example how to use my data? Timestamp; Sensor1; Sensor2; Sensor n+1; y 2023-07-25 065800; 19.82, 50.27, 0.1; 223

José Morales

11/28/2023, 7:02 PM

If you'll have the future values for all sensors you can specify all as future exog, e.g.

futr_exog_list = ['Sensor1', 'Sensor2', ...]

otherwise you can specify them as historic only through

hist_exog_list

Felix Saretzky

11/28/2023, 7:23 PM

I do not get how to set up then the unique_id then. If you check the nixtla predictive maintenance example they have assigned for every run to failure data a number.

José Morales

11/28/2023, 7:30 PM

The unique_id should identify each time serie. In the maintenance example each unique_id is a different aircraft engine

Felix Saretzky

11/28/2023, 7:35 PM

Yes but what if you have just one aircraft engine with RUL over time?

José Morales

11/28/2023, 7:36 PM

You can set a dummy unique_id like 0

Felix Saretzky

11/28/2023, 7:41 PM

I set it before to ‚m8‘ but there was the error from my first message. But I need to verify this again

Farzad E

12/11/2023, 5:18 PM

@José Morales I am getting the same error but with my own data. I didn't have this error before but I forgot what version of Neuralforecast I was using before. Now I have 1.6.4. I am looking at your code above and trying to see how you fixed it. Are you adding the last date in the train set to the test set? So if the train set ends on 2023-01-01 then the test set starts from 2023-01-01 instead of starting from the day after (2023-01-02)? Or did I misunderstand your fix?

José Morales

12/11/2023, 5:21 PM

Can you install from github? We've added a function to get the base of the futr_df to make this easier

Farzad E

12/11/2023, 5:32 PM

@José Morales what's the best way to install from Github? What should I do after I clone the repo to install it?

Farzad E

12/11/2023, 5:36 PM

@José Morales like this?

pip install git+<https://github.com/Nixtla/neuralforecast.git>

José Morales

12/11/2023, 5:50 PM

Yes, that should work

Felix Saretzky

12/12/2023, 8:21 AM

@Farzad E how is your data look like?

Felix Saretzky

12/12/2023, 8:28 AM

@José Morales because if I do not have the data (run-to-failure data) in this format: • train and test data should have the same amount of unique numbers (e.g. train 1-10, test 1-10) + first ds of the test is the last ds from train + frequency • 1 unique_id for all run to failure series which makes no sense If the data is not in this structure I can not evaluate the model. Problem with the first case is that you need to split the data 50% for testing which is too much

José Morales

12/12/2023, 4:00 PM

Are you looking to train on some ids and predict on others?

Felix Saretzky

12/12/2023, 7:20 PM

I do have multiple run to failure series > 100 unique series. I want to train on 80% and fully backtest on 20%

José Morales

12/12/2023, 7:26 PM

you can provide another df to predict, that way it will predict only the series present in that df

Felix Saretzky

12/13/2023, 2:16 PM

but your first ds in the test_df must be the last ds + frequency right?

Felix Saretzky

12/13/2023, 2:16 PM

Then Im not sure how to do it instead of splitting it 50, 50 with the same unique Ids like in the predicitve maintenance example on the nixtla webpage.

Felix Saretzky

12/13/2023, 2:21 PM

E.g. I have in total 28 run to failure series. Each of them are unique and sampled on different dates. Also the length is different. How would you model the ds and unique_id column @José Morales?

Farzad E

12/13/2023, 4:21 PM

@Felix Saretzky My issue was that I had discrepancies between the unique_ids of my training set and my futr_df. Once I synchronized those, the problem was resolved. Your use case is more complex so I am not sure about your example.

José Morales

12/13/2023, 4:47 PM

Usually you validate on the same series, if you're looking to validate on different ones you have to provide the new ids through the

df

argument of predict and a

futr_df

for it. The easiest way to see the expected structure is to install from github, run fit and the run the

make_future_dataframe

method providing the df with your new ids

Felix Saretzky

12/13/2023, 4:52 PM

So I should not split my data? Normally I thought I train on lets say 20 unique time series (run-to-failure) and test my model accuracy on the other unique series

Farzad E

12/13/2023, 5:00 PM

@Felix Saretzky The split doesn't happen like that. It happens in the time domain. You train on 3 years for example and test on one year. For testing new unique_ids you should follow what @José Morales explained.

4 Views

Open in Slack

Previous Next