From 21388caa4e4e57b33787c53a909e3315efd2f74f Mon Sep 17 00:00:00 2001 From: Casandra Qiu Date: Wed, 30 Mar 2016 14:29:38 -0400 Subject: [PATCH 1/2] Add update_nvidia_driver section to CUDA installaltion documentation --- docs/source/advanced/gpu/nvidia/index.rst | 1 + .../gpu/nvidia/update_nvidia_driver.rst | 41 +++++++++++++++++++ 2 files changed, 42 insertions(+) create mode 100644 docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst diff --git a/docs/source/advanced/gpu/nvidia/index.rst b/docs/source/advanced/gpu/nvidia/index.rst index db621ed66..ea9459018 100644 --- a/docs/source/advanced/gpu/nvidia/index.rst +++ b/docs/source/advanced/gpu/nvidia/index.rst @@ -17,3 +17,4 @@ Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install bot deploy_cuda_node.rst verify_cuda_install.rst management.rst + update_nvidia_driver.rst diff --git a/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst new file mode 100644 index 000000000..e7a651cf2 --- /dev/null +++ b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst @@ -0,0 +1,41 @@ +Upgrade NVIDIA Driver +===================== + +If the user wants to update the newer NVIDIA driver on the system, need to :doc:`create New CUDA software reposity ` . Assume the newer driver is in the ``/install/cuda-7.5/ppc64le/nvidia_new`` for the following processes. + +Diskful +------- + +#. Change pkgdir for the cuda image: :: + + chdef -t osimage -o rhels7.2-ppc64le-install-cudafull \ + pkgdir=/install/cuda-7.5/ppc64le/nvidia_new,/install/cuda-7.5/ppc64le/cuda-deps + + +#. Use xdsh command to remove all the nvidia rpms: :: + + xdsh "yum remove *nvidia* -y" + + +#. Run updatenode command to upgrade NVIDIA driver on the compute node: :: + + updatenode -S + + +#. Reboot compute node: :: + + rpower off + rpower on + + +#. Verify the newer driver level on the compute node: :: + + nvidia-smi | grep Driver + + + + +Diskless +-------- + +For update new NVIDIA driver on the diskless compute node, the easy and simple way is re-generate the osimage with New NIVIDIA driver reposity and re-provision the node with this osimage because node needs to be reboot in order for NIVIDIA driver to load. Please follow :doc:`this doc ` to create osimage definitions and deploy CUDA nodes. From 8718e43f0756ffe249a2f21902143499ae4c89a6 Mon Sep 17 00:00:00 2001 From: Casandra Qiu Date: Tue, 26 Apr 2016 14:29:16 -0400 Subject: [PATCH 2/2] update comments from victor --- .../advanced/gpu/nvidia/update_nvidia_driver.rst | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst index e7a651cf2..d7b726c05 100644 --- a/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst +++ b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst @@ -1,7 +1,9 @@ -Upgrade NVIDIA Driver +Update NVIDIA Driver ===================== -If the user wants to update the newer NVIDIA driver on the system, need to :doc:`create New CUDA software reposity ` . Assume the newer driver is in the ``/install/cuda-7.5/ppc64le/nvidia_new`` for the following processes. +If the user wants to update the newer NVIDIA driver on the system, follow the :doc:`Create CUDA software repository ` document to create another repository for the new driver. + +The following example assumes the new driver is in ``/install/cuda-7.5/ppc64le/nvidia_new``. Diskful ------- @@ -12,12 +14,12 @@ Diskful pkgdir=/install/cuda-7.5/ppc64le/nvidia_new,/install/cuda-7.5/ppc64le/cuda-deps -#. Use xdsh command to remove all the nvidia rpms: :: +#. Use xdsh command to remove all the NVIDIA rpms: :: xdsh "yum remove *nvidia* -y" -#. Run updatenode command to upgrade NVIDIA driver on the compute node: :: +#. Run updatenode command to update NVIDIA driver on the compute node: :: updatenode -S @@ -28,7 +30,7 @@ Diskful rpower on -#. Verify the newer driver level on the compute node: :: +#. Verify the newer driver level: :: nvidia-smi | grep Driver @@ -38,4 +40,6 @@ Diskful Diskless -------- -For update new NVIDIA driver on the diskless compute node, the easy and simple way is re-generate the osimage with New NIVIDIA driver reposity and re-provision the node with this osimage because node needs to be reboot in order for NIVIDIA driver to load. Please follow :doc:`this doc ` to create osimage definitions and deploy CUDA nodes. +To update a new NVIDIA driver on diskless compute nodes, re-generate the osimage pointing to the new NVIDIA driver repository and reboot the node to load the diskless image. + +Refer to :doc:`Create osimage definitions ` for specific instructions.