Devin Gaffney
05/03/2024, 5:28 PMDevin Gaffney
05/03/2024, 5:28 PM2024-05-03T17:20:05.994952347Z GPU available: True (cuda), used: True
2024-05-03T17:20:06.013560446Z TPU available: False, using: 0 TPU cores
2024-05-03T17:20:06.013613468Z IPU available: False, using: 0 IPUs
2024-05-03T17:20:06.013695960Z HPU available: False, using: 0 HPUs
2024-05-03T17:20:06.790184151Z You are using a CUDA device ('NVIDIA RTX A4500') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read <https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision>
2024-05-03T17:20:06.790884471Z Missing logger folder: /lightning_logs
2024-05-03T17:20:06.811951650Z LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
2024-05-03T17:20:06.900000345Z
2024-05-03T17:20:06.900010435Z | Name | Type | Params
2024-05-03T17:20:06.900012605Z -----------------------------------------------
2024-05-03T17:20:06.900014206Z 0 | loss | MQLoss | 7
2024-05-03T17:20:06.900015776Z 1 | padder_train | ConstantPad1d | 0
2024-05-03T17:20:06.900017056Z 2 | scaler | TemporalNorm | 0
2024-05-03T17:20:06.900018576Z 3 | blocks | ModuleList | 5.6 M
2024-05-03T17:20:06.900020016Z -----------------------------------------------
2024-05-03T17:20:06.900021346Z 5.6 M Trainable params
2024-05-03T17:20:06.900023426Z 7 Non-trainable params
2024-05-03T17:20:06.900024736Z 5.6 M Total params
2024-05-03T17:20:06.900026506Z 22.370 Total estimated model params size (MB)
Devin Gaffney
05/03/2024, 5:29 PMEpoch 48: 100%|██████████| 1/1 [00:00<00:00, 2.66it/s, v_num=2, train_loss_step=116.0, train_loss_epoch=116.0]
Devin Gaffney
05/03/2024, 5:33 PMimport torch
def check_gpu_status():
# Check if CUDA is available
cuda_available = torch.cuda.is_available()
print("CUDA Available:", cuda_available)
if cuda_available:
# Print the number of GPUs available
num_gpus = torch.cuda.device_count()
print("Number of GPUs Available:", num_gpus)
# Print the current device PyTorch is using
current_device = torch.cuda.current_device()
print("Current GPU Device:", torch.cuda.get_device_name(current_device))
else:
print("No GPU available.")
# Run the function
check_gpu_status()
Marco
05/03/2024, 5:33 PMCUDA_VISIBLE_DEVICES=0 python yourfile.py
Devin Gaffney
05/03/2024, 5:34 PMDevin Gaffney
05/03/2024, 5:35 PMMarco
05/03/2024, 5:36 PMDevin Gaffney
05/03/2024, 5:36 PMDevin Gaffney
05/03/2024, 5:37 PMGPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/usr/local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'predict_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
Predicting DataLoader 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 134.66it/s]
/usr/local/lib/python3.10/site-packages/neuralforecast/core.py:184: FutureWarning: In a future version the predictions will have the id as a column. You can set the `NIXTLA_ID_AS_COL` environment variable to adopt the new behavior and to suppress this warning.
warnings.warn(
>>> import os
>>> os.env
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'os' has no attribute 'env'
>>> os.getenv("CUDA_VISIBLE_DEVICES")
'0'
Devin Gaffney
05/03/2024, 5:37 PMDevin Gaffney
05/03/2024, 5:39 PMMarco
05/03/2024, 5:42 PMDevin Gaffney
05/03/2024, 5:42 PMDevin Gaffney
05/03/2024, 5:42 PMDevin Gaffney
05/03/2024, 5:42 PMDevin Gaffney
05/03/2024, 5:42 PMDevin Gaffney
05/03/2024, 5:42 PMDevin Gaffney
05/03/2024, 5:43 PMDevin Gaffney
05/03/2024, 5:43 PMNeuralForecast(
models=[NHITS(loss=MQLoss(level=[75, 95, 99]), batch_size=100, input_size=365, h=365, max_steps=50, num_workers_loader=1)],
freq="D"
)
Marco
05/03/2024, 5:45 PMnvidia-smi
to confirm the GPU is being used?Devin Gaffney
05/03/2024, 5:46 PMDevin Gaffney
05/03/2024, 5:47 PMroot@691a50a8e634:/# nvidia-smi
Fri May 3 17:47:09 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A4500 On | 00000000:81:00.0 Off | Off |
| 30% 32C P2 58W / 200W | 468MiB / 20470MiB | 12% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
root@691a50a8e634:/#neural-forecast
Devin Gaffney
05/03/2024, 5:47 PMDevin Gaffney
05/03/2024, 5:48 PMDevin Gaffney
05/03/2024, 5:49 PMMarco
05/03/2024, 5:53 PMMarco
05/03/2024, 5:54 PMDevin Gaffney
05/03/2024, 5:55 PMDevin Gaffney
05/03/2024, 5:55 PMDevin Gaffney
05/03/2024, 5:55 PMJosé Morales
05/03/2024, 6:13 PMDevin Gaffney
05/03/2024, 6:14 PMDevin Gaffney
05/03/2024, 6:15 PMDevin Gaffney
05/03/2024, 6:15 PMJosé Morales
05/03/2024, 6:15 PMDevin Gaffney
05/03/2024, 6:15 PMDevin Gaffney
05/03/2024, 6:15 PMJosé Morales
05/03/2024, 6:17 PMmax_steps=50
means 50 epochs which can be done in ~2s, so you may not even see the GPU usage go up in that timeJosé Morales
05/03/2024, 6:21 PMprecision='16-mixed'
to your init arguments, that'll use mixed precision trainingDevin Gaffney
05/03/2024, 6:26 PMNeuralForecast(
models=[NHITS(loss=MQLoss(level=REPORTED_CONFIDENCE_INTERVALS), batch_size=100, input_size=input_size, h=365, max_steps=25)],
freq="D"
)
José Morales
05/03/2024, 6:27 PMDevin Gaffney
05/03/2024, 6:27 PMDevin Gaffney
05/03/2024, 6:28 PMwhile True
Devin Gaffney
05/03/2024, 6:28 PMDevin Gaffney
05/03/2024, 6:43 PMDevin Gaffney
05/03/2024, 6:44 PMDevin Gaffney
05/03/2024, 6:45 PMnum_workers_loader
José Morales
05/03/2024, 6:45 PMDevin Gaffney
05/03/2024, 6:47 PMDevin Gaffney
05/03/2024, 6:47 PMDevin Gaffney
05/03/2024, 6:47 PMDevin Gaffney
05/03/2024, 6:47 PMDevin Gaffney
05/04/2024, 8:59 PM