Mariana Menchero
02/27/2025, 6:02 PMMariana Menchero
02/27/2025, 6:03 PMYibei
02/27/2025, 6:17 PMWe have different GPU models so depending on which one runs the process the results can vary, each one may use a different algorithm for matmul so you end up with slightly different results in each forward pass and 700 finetune steps can cause bigger differences.
we have a relative tolerance of 0.1% and absolute tolerance of 0.0001
Mariana Menchero
02/27/2025, 6:18 PM