nickeleres
04/30/2024, 11:41 AM(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/ray/tune/integration/pytorch_lightning.py:198: `ray.tune.integration.pytorch_lightning.TuneReportCallback` is deprecated. Use `ray.tune.integration.pytorch_lightning.TuneReportCheckpointCallback` instead.
(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/parsing.py:199: Attribute 'loss' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['loss'])`.
(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/parsing.py:199: Attribute 'valid_loss' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['valid_loss'])`.
(_train_tune pid=6410) Seed set to 6
(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/torch/nn/init.py:452: UserWarning: Initializing zero-element tensors is a no-op
(_train_tune pid=6410) warnings.warn("Initializing zero-element tensors is a no-op")
(_train_tune pid=6410) GPU available: True (cuda), used: True
(_train_tune pid=6410) TPU available: False, using: 0 TPU cores
(_train_tune pid=6410) IPU available: False, using: 0 IPUs
(_train_tune pid=6410) HPU available: False, using: 0 HPUs
(_train_tune pid=6410) You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read <https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision>
(_train_tune pid=6410) Missing logger folder: /tmp/ray/session_2024-04-30_11-37-44_977220_4931/artifacts/2024-04-30_11-37-47/_train_tune_2024-04-30_11-37-44/working_dirs/_train_tune_0c7c6ab7_2_attn_dropout=0.0000,batch_size=16,dropout=0.1000,early_stop_patience_steps=2,enable_progress_bar=False,futr_2024-04-30_11-37-54/lightning_logs
(_train_tune pid=6410) 2024-04-30 11:39:20.675778: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
(_train_tune pid=6410) 2024-04-30 11:39:20.675836: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
(_train_tune pid=6410) 2024-04-30 11:39:20.677586: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
(_train_tune pid=6410) 2024-04-30 11:39:21.828232: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
(_train_tune pid=6410) LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
(_train_tune pid=6410)
(_train_tune pid=6410) | Name | Type | Params
(_train_tune pid=6410) ---------------------------------------------------------------------
(_train_tune pid=6410) 0 | loss | HuberLoss | 0
(_train_tune pid=6410) 1 | valid_loss | HuberLoss | 0
(_train_tune pid=6410) 2 | padder_train | ConstantPad1d | 0
(_train_tune pid=6410) 3 | scaler | TemporalNorm | 0
(_train_tune pid=6410) 4 | embedding | TFTEmbedding | 44.0 K
(_train_tune pid=6410) 5 | static_encoder | StaticCovariateEncoder | 20.0 M
(_train_tune pid=6410) 6 | temporal_encoder | TemporalCovariateEncoder | 137 M
(_train_tune pid=6410) 7 | temporal_fusion_decoder | TemporalFusionDecoder | 15.5 M
(_train_tune pid=6410) 8 | output_adapter | Linear | 1.0 K
(_train_tune pid=6410) ---------------------------------------------------------------------
(_train_tune pid=6410) 173 M Trainable params
(_train_tune pid=6410) 0 Non-trainable params
(_train_tune pid=6410) 173 M Total params
(_train_tune pid=6410) 693.152 Total estimated model params size (MB)
(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/data.py:77: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
(_train_tune pid=6410) /usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fit_loop.py:298: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
i keep on getting this log when i run .fit(...)
...has anyone seen this before?José Morales
04/30/2024, 3:19 PMimport logging
import warnings
logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)
warnings.filterwarnings("ignore")
You can add those to your script/notebook to get the previous behaviornickeleres
04/30/2024, 4:06 PMJosé Morales
04/30/2024, 4:07 PMnickeleres
04/30/2024, 4:38 PMJosé Morales
04/30/2024, 4:45 PMnickeleres
04/30/2024, 4:58 PMnickeleres
04/30/2024, 4:58 PMnickeleres
04/30/2024, 4:59 PMJosé Morales
04/30/2024, 4:59 PM