It seems the timeout for the distributed tests is set to low and spurious failures can be seen Increase it by a factor of 6 similar to torch/testing/_internal/distributed/distributed_test.py Original patch by Alexander Grund (TU Dresden), updated by Caspar van Leeuwen (SURF) diff -Nru pytorch-1.11.0-rc3.orig/torch/testing/_internal/common_distributed.py pytorch-1.11.0-rc3/torch/testing/_internal/common_distributed.py --- pytorch-1.11.0-rc3.orig/torch/testing/_internal/common_distributed.py 2022-02-24 18:07:16.414274654 +0100 +++ pytorch-1.11.0-rc3/torch/testing/_internal/common_distributed.py 2022-02-24 18:08:31.772851148 +0100 @@ -321,7 +321,7 @@ # TSAN runs much slower. TIMEOUT_DEFAULT = 500 else: - TIMEOUT_DEFAULT = 100 + TIMEOUT_DEFAULT = 600 TIMEOUT_OVERRIDE = {"test_ddp_uneven_inputs": 400}