KRATOS Documentation¶
KRATOS is an academic Kubernetes operator project for studying application-aware GPU scheduling of CUDA workloads on heterogeneous clusters.
The current goal is to let users describe CUDA workloads with requirements such as GPU memory, compute capability, priority, replica count, and distributed constraints. The controller can then use profiling information from previous runs to score eligible nodes for later executions.
Pages¶
- Getting Started: local setup and quick checks.
- Local GPU Lab: GPU-enabled local cluster with NVIDIA time-slicing.
- Architecture: planned control flow and system components.
- Operator: expected
CUDAExperimentlifecycle. - Observability: local Prometheus and Grafana stack.
- Experiments: baselines, workload classes, and metrics.
- Development Workflow: common development commands.
- Project Structure: repository layout.