Integrating NVLink-Fusion Enabled RISC-V Platforms with Kubernetes: A Practical Guide
KubernetesGPUsRISC-V

Integrating NVLink-Fusion Enabled RISC-V Platforms with Kubernetes: A Practical Guide

UUnknown
2026-03-11
10 min read
Advertisement

Practical guide to expose NVLink‑attached GPUs on SiFive RISC‑V nodes to Kubernetes — device plugins, topology‑aware scheduling, and NUMA tuning for ML.

Pain point: your ML workloads need predictable, low-latency access to multiple GPUs across a rack — but your cluster runs RISC‑V hosts built on SiFive silicon and Kubernetes. How do you expose NVLink‑attached GPUs reliably to containers, preserve NUMA locality for throughput-sensitive models, and keep scheduling decisions topology-aware?

In late 2025 SiFive announced integration with NVIDIA's NVLink Fusion, opening the door for RISC‑V CPUs to directly leverage GPU fabrics built on NVLink. By 2026 the ecosystem is racing to add the orchestration and runtime pieces: device plugins, topology exports, kubelet tuning, and CI/CD practices that make NVLink‑aware ML on RISC‑V practical in production. This guide shows you how to connect the dots and provides templates, manifests, and a CI pipeline you can adapt.

What you’ll get from this guide

  • Concrete steps to expose NVLink‑attached GPUs from SiFive RISC‑V nodes into Kubernetes
  • Device plugin recommendations and a DaemonSet template
  • How to enable topology‑aware scheduling and align CPUs, memory and GPUs
  • NUMA and kernel tuning for ML workloads
  • A CI/CD example (GitHub Actions) to build, cross‑compile and deploy device plugin images for RISC‑V

NVLink Fusion extends NVIDIA's GPU interconnects to tighter CPU–GPU coherency and fabric-level communication. SiFive's 2025/early‑2026 announcements committed RISC‑V IP to support NVLink Fusion, which means system designers can build platforms where the CPU and multiple GPUs share a high‑bandwidth, low‑latency interconnect. For orchestration teams this creates opportunities — and challenges:

  • Opportunity: multi‑GPU models get better scaling if the scheduler can place pods with GPU + CPU + memory affinity across NVLink islands.
  • Challenge: Kubernetes must be fed accurate topology info (NUMA nodes, GPU affinity) and device plugins must expose GPUs correctly on a RISC‑V host.
Source: industry announcements in late 2025 and early 2026 indicate NVIDIA and SiFive collaboration on NVLink Fusion support for RISC‑V platforms.

High‑level architecture

At a high level, you want three things working together:

  1. Device plugin on each RISC‑V node that registers GPUs with kubelet and publishes topology hints.
  2. Node topology exporter that reports NUMA, CPU and device topology to the scheduler.
  3. Kubelet & scheduler configuration that enforces CPU/GPU co‑placement (CPUManager + Topology Manager + topology‑aware scheduling).

Step 1 — Validate hardware & kernel support on SiFive nodes

Before anything else, ensure the SiFive platform has the kernel drivers, IOMMU, and NVLink firmware in place. Work with your silicon vendor (SiFive) and NVIDIA to confirm the RISC‑V Linux driver stack and CUDA / driver binaries are available for RISC‑V64. Verify these points:

  • IOMMU enabled (for VFIO / PCI passthrough): check /proc/cmdline for intel_iommu=on or amd_iommu=on equivalent on platform.
  • vfio, vfio-pci, and iommu drivers loaded: lsmod | grep vfio
  • GPU devices visible via lspci and vendor drivers (nvidia or vendor-supplied) loaded.
  • NVLink fabric up: nvidia-smi topo -m (or vendor tool) shows NVLink connections between GPUs.

Useful validation commands

lspci -nn | grep -i nvidia
nvidia-smi topo -m   # or vendor provided topology command
cat /sys/class/vfio/*/iommu_group

Step 2 — Choose the right device exposure strategy

There are three practical approaches to expose GPUs:

  • NVIDIA device plugin (recommended when vendor provides RISC‑V-enabled runtime): integrates with nvidia-container-runtime to expose GPUs as device resources (nvidia.com/gpu).
  • VFIO / PCI passthrough: use libvirt or direct device allocation for full‑GPU access (best performance but less shareable).
  • Virtualization / SR‑IOV or MIG: use GPU virtualization (MIG-like) or SR‑IOV if supported to partition GPUs for multi-tenant use.

For NVLink Fusion systems you'll typically want the NVIDIA device plugin that speaks topology. If vendor runtimes for RISC‑V are not yet available, VFIO passthrough can be a stopgap while drivers mature.

Step 3 — Device plugin: topology-aware registration

A device plugin must do more than register devices — it should advertise NUMA node affinity for each GPU so the kubelet's Topology Manager can make aligned allocations.

Device plugin responsibilities

  • Enumerate GPUs (PCI address, device ID) and identify the NUMA node or affinity mask.
  • Return topology hints for allocation requests (if plugin API supports hints).
  • Create device nodes or mount points and work with the container runtime to provide access (e.g., /dev/nvidia*) or VFIO device paths.

If you are adapting NVIDIA's plugin for RISC‑V, ensure it runs as a DaemonSet compiled for RISC‑V64. Below is a minimal DaemonSet manifest (trimmed) to get started. Replace the image with your cross‑compiled RISC‑V build.

DaemonSet template (excerpt)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvlink-device-plugin
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvlink-device-plugin
  template:
    metadata:
      labels:
        name: nvlink-device-plugin
    spec:
      hostNetwork: true
      containers:
      - name: device-plugin
        image: ghcr.io/yourorg/nvlink-device-plugin:riscv64-v1
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /dev
          name: dev
      volumes:
      - name: dev
        hostPath:
          path: /dev

Step 4 — Expose NUMA & topology to the scheduler

Kubernetes needs to know how CPUs and GPUs map to NUMA nodes. Use one (or both) of these:

  • Topology Manager (kubelet): aligns CPU and device allocations per pod. Configure CPUManager to static and TopologyManager to a strict policy such as single‑numa‑node for best locality.
  • Resource Topology Exporter (RTE): exports node topology (NUMA zones, CPU counts, device resources) to the scheduler via CRD or Metrics API so the scheduler can make node‑level placement decisions.
# kubelet config YAML (snippet)
cpuManagerPolicy: "static"
cpuManagerReconcilePeriod: "5s"
topologyManagerPolicy: "single-numa-node"
topologyManagerScope: "pod"
featureGates:
  TopologyManager: true

Notes:

  • single‑numa‑node enforces strict colocations. Use best‑effort for softer constraints.
  • When Topology Manager is strict, your device plugin must provide topology hints or the kubelet may fail allocations.

Step 5 — Pod spec patterns for ML workloads

Structure your pod resource requests to request both CPUs and GPUs. This gives the kubelet the signals it needs to co‑place resources.

Pod example

apiVersion: v1
kind: Pod
metadata:
  name: ml-train-nvlink
spec:
  containers:
  - name: trainer
    image: ghcr.io/yourorg/pytorch-riscv:latest
    resources:
      requests:
        cpu: "16"
        memory: "128Gi"
        nvidia.com/gpu: "2"
      limits:
        cpu: "16"
        memory: "128Gi"
        nvidia.com/gpu: "2"
    env:
    - name: NVIDIA_VISIBLE_DEVICES
      value: "all"

Use nodeSelectors or nodeAffinity only when necessary. Ideally the topology exports and topology manager will schedule correctly without hard node pins.

NUMA and kernel tuning recommendations for ML

  • CPU pinning: use CPUManager static policy and set cpuset to the pod’s CPUs to reduce cross‑NUMA memory access.
  • Hugepages: allocate hugepages (2MiB or 1GiB) for memory‑intensive ML workloads; set hugepages in the pod spec as needed.
  • Disable automatic NUMA balancing cautiously — test: on some kernels turning off automatic NUMA balancing can reduce cross‑node thrash for latency‑sensitive ML.
  • IRQ affinity: bind GPU interrupts to CPUs on the same NUMA node (echo CPU mask > /proc/irq/…/smp_affinity) to reduce latency.
  • Memory policy: for multi‑socket RISC‑V boards, ensure malloc and RDMA libraries prefer local NUMA nodes via numactl or libnuma settings.

Testing & validation

Run these checks to confirm correct placement and topology alignment:

  • nvidia-smi topo -m (or vendor equivalent) inside the node — confirm NVLink mesh is visible.
  • kubectl describe pod — check AllocatedResources and events for topology manager messages.
  • On node, inspect /sys/devices/system/node/node*/cpulist to map CPUs to devices.
  • Use hwloc (or lscpu + lsdev) to visualize NUMA topology.
  • Run a microbenchmark (e.g., NCCL bandwidth tests, or PyTorch DDP ping‑pong) to measure inter‑GPU bandwidth across NVLink vs PCI‑only paths.

CI/CD: Build & deploy device plugin for RISC‑V (GitHub Actions example)

For production, automate building your device plugin for RISC‑V and deploying to clusters. The example below cross‑compiles a Go device plugin and builds a multi‑arch image with buildx.

# .github/workflows/ci.yml (excerpt)
name: Build and Push Device Plugin
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Set up QEMU
      uses: docker/setup-qemu-action@v2
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2
    - name: Login to registry
      uses: docker/login-action@v2
      with:
        registry: ghcr.io
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    - name: Build and push multi-arch image
      run: |
        docker buildx build \
          --platform linux/amd64,linux/riscv64 \
          -t ghcr.io/yourorg/nvlink-device-plugin:sha-${{ github.sha }} \
          --push .
    - name: Deploy to cluster
      env:
        KUBECONFIG: ${{ secrets.KUBECONFIG_RISCV_CLUSTER }}
      run: |
        kubectl set image daemonset/nvlink-device-plugin -n kube-system \
          device-plugin=ghcr.io/yourorg/nvlink-device-plugin:sha-${{ github.sha }}

Operational tips & monitoring

  • Export topology metrics: run Resource Topology Exporter and feed metrics into Prometheus so scheduling anomalies are visible.
  • Track allocation failures: watch kubelet events for Topology Manager rejects — these indicate mismatched requests vs available affinity.
  • Benchmark regularly: NVLink performance can vary with firmware and driver updates; include NCCL or microbenchmarks in your CI to catch regressions.
  • Maintain driver matrix: track which CUDA / driver versions support RISC‑V on SiFive — pin your runtime images accordingly.

Common pitfalls and how to avoid them

  • Assuming GPUs are NUMA‑agnostic: they are not. Always export NUMA affinity from the device plugin.
  • Changing topology manager policy after workloads are deployed — this can cause scheduling churn. Test in staging first.
  • Not cross‑compiling plugin images for RISC‑V — runtime mismatches will prevent daemonset pods from running on the nodes.
  • Relying on nodeSelector as a long‑term solution — use topology exports and kubelet alignment so the scheduler scales across nodes dynamically.

Advanced strategies for multi‑GPU ML at scale

For large clusters with NVLink Fusion fabrics, consider the following:

  • Topology‑aware job orchestration: extend your workload controller (Kubernetes Job, MPI operator, or custom scheduler plugin) to request contiguous NVLink islands for multi‑node training.
  • GPU pooling & MIG: if supported, combine MIG partitions with topology hints to pack more tenants without breaking locality.
  • Cross‑node RDMA + NVLink hybrid: for training spanning multiple NVLink domains, prefer RDMA across NICs with low CPU overhead and favor network paths that map to GPU NUMA nodes appropriately.

Future outlook (2026 predictions)

Expect the RISC‑V + NVIDIA ecosystem to mature through 2026:

  • Official RISC‑V builds of NVIDIA drivers and container runtimes will become available from NVIDIA and partners.
  • Device plugin ecosystems will include explicit NVLink topology support and ready‑made exporters for NUMA and fabric graphs.
  • Scheduler plugins and Kubernetes scheduler improvements will make topology‑aware scheduling easier for heterogeneous racks (GPU, CPU, DPU).

Actionable checklist (copy into your runbook)

  1. Validate kernel/IOMMU/VFIO and NVLink visibility on SiFive node.
  2. Build or obtain an NVIDIA/SIFive device plugin compiled for RISC‑V.
  3. Enable CPUManager (static) and Topology Manager (single‑numa‑node) on kubelet.
  4. Deploy Resource Topology Exporter to surface NUMA info to the scheduler.
  5. Use pod specs requesting CPUs and GPUs; validate placement with hwloc / nvidia‑tools and kubelet events.
  6. Automate builds with cross‑compile CI and push multi‑arch images to your registry.

Closing: Where to get templates and next steps

If you’re implementing NVLink Fusion on SiFive RISC‑V nodes, start with the device plugin DaemonSet template above, adopt the kubelet config snippets, and add the CI workflow to your repo. Benchmarks should be part of your deployment pipeline so you detect driver and firmware regressions early.

Call to action: clone our starter repo (device plugin, kubelet config, RTE manifest, GitHub Actions pipeline) and run the validation checklist in a staging cluster. If you want a tailored checklist or a review of your manifests for NUMA correctness, contact our team or request a template for your cloud provider — we’ll help you map NVLink islands to Kubernetes scheduling policies.

Advertisement

Related Topics

#Kubernetes#GPUs#RISC-V
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-03-11T09:12:57.212Z