I’ve been using PyTorch, and there are a couple of things that have gotten me. I’m writing this down for future reference.
It’s highly recommended that you do NOT screw with other linux distributions other than Ubuntu. It will make a lot of things miserable. For example, I switched to PopOS which I think is built on top of Ubuntu, but still had issues. I couldn’t find good resources to solve the issues.
For PopOS, they seemed only have CUDA versions up to 11.2. But As of April 2022, We’re up to CUDA 11.6
I think this is the appropriate steps to remove PopOS dependencies, and install Ubuntu drivers.
#remove prior Pop-Os nvidia stuff
sudo apt remove nvidia-cuda-toolkit
sudo apt remove system76-cudnn-11.2
sudo apt remove system76-cuda-11.2
sudo apt remove system76-driver
sudo apt install nvidia-driver-510
sudo apt install nvidia-cuda-toolkit
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
One super critical thing that went past me, is that you cannot simply swap out the driver versions like this, and change from 11.3 to 11.2[2]
conda install pytorch torchvision torchaudio cudatoolkit=11.2 -c pytorch
This will download the latest pytorch version, and it may be not compatible with cudatoolkit-11.2 You have to downgrade the PyTorch version.
[2] https://discuss.pytorch.org/t/torch-cuda-is-not-available/74845/27