Local GPU Lab

This page describes an operational local setup for running KRATOS against a Kubernetes cluster that can schedule CUDA workloads. It uses nvkind to create a kind-based cluster with NVIDIA GPU support, then installs the NVIDIA device plugin with time-slicing enabled.

Time-slicing advertises multiple logical nvidia.com/gpu slots for each physical GPU. It is useful for local tests, but it does not provide memory or fault isolation between workloads.

Prerequisites

  • Linux host with a working NVIDIA driver.
  • Docker.
  • NVIDIA Container Toolkit.
  • kind, kubectl, helm, jq, and nvkind.
  • A KRATOS controller image that is available to the cluster.

Configure Docker GPU Support

Configure Docker to use the NVIDIA runtime:

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
sudo systemctl restart docker

Verify that containers can access the GPU:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Create the GPU Cluster

Create the cluster:

nvkind cluster create --name kratos-gpu

Wait for the nodes to become ready:

kubectl wait --for=condition=Ready nodes --all --timeout=120s
kubectl get nodes

Check that nvkind can see the host GPU:

nvkind cluster print-gpus --name kratos-gpu

Install GPU Time-Slicing

Add the NVIDIA device plugin Helm repository:

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

Create a device plugin configuration. This example exposes each physical GPU as three schedulable logical GPU slots:

cat <<'EOF' > /tmp/nvidia-device-plugin-config.yaml
version: v1
flags:
  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    deviceListStrategy: envvar
    deviceIDStrategy: uuid
sharing:
  timeSlicing:
    failRequestsGreaterThanOne: true
    resources:
    - name: nvidia.com/gpu
      replicas: 3
EOF

Install the device plugin:

helm upgrade -i nvdp nvdp/nvidia-device-plugin \
  --namespace nvidia-device-plugin \
  --create-namespace \
  --set runtimeClassName=nvidia \
  --set config.default=config \
  --set-file config.map.config=/tmp/nvidia-device-plugin-config.yaml \
  --set affinity=null

Check the plugin pods:

kubectl get ds -n nvidia-device-plugin
kubectl get pods -n nvidia-device-plugin

Check the GPU capacity advertised by the nodes:

kubectl get nodes -o json | jq -r '
.items[] | {
  name: .metadata.name,
  capacity: .status.capacity["nvidia.com/gpu"],
  allocatable: .status.allocatable["nvidia.com/gpu"]
}'

A node with one physical GPU and replicas: 3 should advertise:

{
  "capacity": "3",
  "allocatable": "3"
}

Verify CUDA Scheduling

Create gpu-vectoradd.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-vectoradd
spec:
  runtimeClassName: nvidia
  restartPolicy: Never
  containers:
  - name: cuda
    image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
    resources:
      limits:
        nvidia.com/gpu: 1

Run the pod and inspect the result:

kubectl apply -f gpu-vectoradd.yaml
kubectl get pod gpu-vectoradd
kubectl logs gpu-vectoradd

Install KRATOS

Install the CUDAExperiment CRD:

make install
kubectl get crd cudaexperiments.gpu.scheduler.io

If the image exists only on the local Docker daemon, load it into the cluster before deploying:

kind load docker-image <local-image> --name kratos-gpu

Deploy the controller with an image that the cluster can pull or already has loaded:

make docker-build IMG=kratos-controller:v0.1.0
kind load docker-image kratos-controller:v0.1.0 --name kratos-operator
make deploy IMG=kratos-controller:v0.1.0

Check the controller deployment:

kubectl get deployments -n kratos-system
kubectl get pods -n kratos-system

Run a CUDAExperiment

Start from the sample custom resource:

kubectl apply -f config/samples/gpu_v1alpha1_cudaexperiment.yaml
kubectl get cudaexperiments

The sample requests one GPU through the gpuRequired field. In a time-sliced local lab, that request consumes one advertised logical nvidia.com/gpu slot.

References: