Arthur LAMBERT
05/08/2024, 8:24 PMAutoTFT
with an EarlyStopping
to track the validation loss. I am using the Optuna
backend, and training on 4 GPUs.
I'd like to retrieve at the end of the HP tuning, the optimal set of HP, but also the epoch at which the model stopped training due to the EarlyStopping
.
My idea was to retrieve the evolution of the validation loss at the end of the AutoTFT
training (with the valid_trajectories
attribute) to get it.
However, when printing it at the end of the training, I obtain different values, one for each GPU (cf screenshot, where max_steps = 98
, and the validation happens every 48 steps, which corresponds to one epoch).
I don't really know how to interpret them, and what would be the best methodology to achieve my goal.
Many thanks for your support!
PS: Worth to mention that I set the random_seed
in my tft_config
, and that I also set the seed in my TPESampler
for Optuna.José Morales
05/08/2024, 10:31 PMimport uuid
import pandas as pd
import pytorch_lightning as pl
from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATSx
from utilsforecast.data import generate_series
series = generate_series(10, min_length=100, max_length=200)
version = uuid.uuid4()
check_steps = 5
patience = 2
nf = NeuralForecast(
models=[
NBEATSx(
input_size=10,
h=10,
max_steps=200,
logger=pl.loggers.CSVLogger(save_dir='.', version=version),
val_check_steps=check_steps,
early_stop_patience_steps=patience,
)],
freq='D',
)
nf.fit(series, val_size=10)
logs = pd.read_csv(f'lightning_logs/version_{version}/metrics.csv')
best_steps = logs['step'].max() - check_steps * patience + 1
Arthur LAMBERT
05/09/2024, 9:15 AMptl/val_loss
and the valid_loss
, and the difference between train_loss_epoch
and train_loss_step
?José Morales
05/09/2024, 3:54 PMArthur LAMBERT
05/09/2024, 6:55 PM