CI/CD Pipeline for TinyML: Continuous Delivery to Raspberry Pi 5 with AI HAT+ 2
Template and tutorial for TinyML CI/CD: package, cross-compile, test, and safely canary-deploy LLMs to Raspberry Pi 5 with AI HAT+ 2.
Ship tiny-but-powerful LLMs to Raspberry Pi 5 + AI HAT+ 2: a production CI/CD template
Hook: You can run real-world LLM inference on fleets of Raspberry Pi 5 devices with the AI HAT+ 2, but getting build, test and deployment right is painful: cross-compilation, model packaging, constrained-memory failures, and rollbacks across hundreds of physical devices. This guide gives a ready-to-use CI/CD template and tutorial (2026) to package quantized models, cross-compile runtimes, run automated tests, and perform safe canary rollouts and automated rollbacks to Pi5 edge LLMs.
Why this matters in 2026
Edge LLM adoption accelerated through 2024–2025 as quantized models and optimized runtimes (ggml/llama.cpp variants, MLC LLM, and vendor NPUs) matured. The AI HAT+ 2 released late 2025 unlocked practical generative workloads on the Raspberry Pi 5 range by exposing hardware acceleration and more RAM headroom for INT8/FP16 inference. That made TinyML CI/CD an operations-first problem: edge fleets are physical devices with firmware, kernel modules and storage constraints — not ephemeral containers in the cloud.
Expectations for 2026: security-by-design (SLSA), artifact signing, OCI model registries, and automated safety gates will be baseline in production pipelines. This article is for developers and infra leads who need a repeatable, secure pipeline for packaging edge LLMs and delivering them safely to Pi5 devices.
High-level pipeline (inverted pyramid)
- Produce artifacts: quantized model files + tokenizer + runtime binaries (arm64 optimized).
- Test artifacts: unit, integration (emulated), and hardware smoke tests.
- Store & sign: push to artifact registry (OCI or S3) and cosign signatures.
- Deliver: push container/image or firmware bundle to device-management platform (Mender, balena, etc.).
- Rollout: staged canary rollout with automated metrics-based rollback.
Architecture & components
- Build host: CI runners (GitHub Actions / GitLab) with Docker buildx and QEMU for cross builds.
- Model conversion: quantizer + tokenizer bundling (ggml, ONNX->ggml, or optimized MLC pipelines).
- Runtime: llama.cpp, custom C++ runtime, or containerized Python runtime built for arm64 with NEON acceleration.
- Artifact storage: OCI registry (GitHub Packages, Harbor) or object store (S3) with content-addressable names.
- Device management: Mender / balenaCloud / Fleet API for orchestrating deployments and groups.
- Monitoring: Prometheus metrics (latency, memory, error rate) and alert rules for rollback triggers.
Model packaging: deterministic, minimal, and auditable
Edge LLM bundles must be small and self-describing. Use a zip/tarball or OCI artifact with this structure:
<model-name>- ├─ /model/ (quantized .ggml / .bin files) ├─ /tokenizer/ (vocab.json, merges) ├─ /runtime/ (compiled binary or wheel for arm64) ├─ metadata.json (version, quantization, input_size, checksums) ├─ provenance.json (training/model-card, SLSA provenance if available) └─ signature.cosign
Best practices:
- Include clear metadata: quantization format (INT8/INT4), expected memory footprint, and a checksum per file.
- Keep tokenizers bundled to avoid runtime download or incompatible tokenizer versions.
- Sign artifacts using cosign and attach provenance as part of SLSA supply chain assertions.
- Use semantic versioning and content-addressable artifact names: modelname@sha256:...
Cross-compilation & runtime builds for Pi5
Pi5 is ARM64. Build strategies in CI:
- Container multi-arch builds (recommended): Use Docker buildx with QEMU to produce arm64 images from x86 CI workers.
- Native cross-compile: Use aarch64 cross toolchains / gcc with GOARCH or CFLAGS tuned for NEON/FPU.
- Binary packaging: Compile runtime as static binary to reduce runtime dependencies; strip symbols for smaller size.
Dockerfile pattern (multi-stage)
FROM --platform=$BUILDPLATFORM ubuntu:24.04 AS builder RUN apt-get update && apt-get install -y build-essential cmake git qemu-user-static WORKDIR /src # Checkout runtime and build for target COPY . . ARG TARGETPLATFORM RUN --mount=type=cache,target=/root/.cache \ mkdir -p build && cd build && cmake .. -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_SYSTEM_PROCESSOR=aarch64 && make -j$(nproc) FROM --platform=$TARGETPLATFORM debian:bookworm-slim COPY --from=builder /src/build/my_runtime /usr/local/bin/my_runtime ENTRYPOINT ["/usr/local/bin/my_runtime"]
Use buildx command in CI: docker buildx build --platform linux/arm64,linux/amd64 -t $IMAGE --push .
Automated tests for edge LLMs
Tests are the difference between a controlled canary and a fleet-wide incident.
Test categories
- Unit tests: tokenizer correctness, config parsing, small deterministic outputs.
- Integration (emulated): run runtime in QEMU or arm64 container and do a smoke infer on a tiny prompt. Verify token counts and no memory OOM.
- Hardware smoke tests: on a Pi5 device or a small pool of self-hosted runners: full inference to detect driver/firmware mismatches on AI HAT+ 2.
- Performance/SLO checks: latency percentiles and max memory. Fail if > threshold.
- Security scans: dependency CVE checks and cosign verification of inputs.
Example: a CI job that runs an inference using the quantized artifact in an arm64 container and asserts latency < 400ms and RSS < 1.2GB.
Artifact storage & signing
For reproducibility store both model artifacts and runtime in an OCI registry when possible. Use ORAS to push model bundles as OCI artifacts so your deployment system can pull them consistently.
Sign with cosign and store signatures in the registry. In 2026, policy engines (conftest + SLSA attestations) are commonly enforced in CI/CD gates.
# push using ORAS oras push $OCI_REGISTRY/model:1.2.0 model.tar.gz --manifest-config metadata.json:application/vnd.acme.model.config.v1+json # sign cosign sign --key cosign.key $OCI_REGISTRY/model:1.2.0
CI/CD example: GitHub Actions pipeline
This YAML shows key jobs: build, test, sign, push, and trigger canary deployment via Mender API. Adapt to GitLab CI and others.
name: Build-Test-Deploy-Pi5
on:
push:
tags: ['v*']
workflow_dispatch:
env:
IMAGE: ghcr.io/org/edge-llm
MODEL_NAME: tiny-llm-quant
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to GHCR
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GHCR_PAT }}
- name: Build multi-arch image and push
run: |
docker buildx build --platform linux/amd64,linux/arm64 \
-t $IMAGE/$MODEL_NAME:${{ github.ref_name }} \
--push .
test-emulated:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Pull arm64 image and run smoke test
run: |
docker run --rm --platform linux/arm64 $IMAGE/$MODEL_NAME:${{ github.ref_name }} \
--smoke-test --max-latency-ms 400
sign-and-publish:
needs: test-emulated
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install cosign
run: curl -sSL "https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64" -o /usr/local/bin/cosign && chmod +x /usr/local/bin/cosign
- name: Sign image
env:
COSIGN_PASSWORD: ${{ secrets.COSIGN_PASSWORD }}
run: |
echo $COSIGN_PASSWORD | cosign import-key-pair --kms nothing cosign.key cosign.pub || true
cosign sign --key cosign.key $IMAGE/$MODEL_NAME:${{ github.ref_name }}
- name: Push model to OCI (optional)
run: oras push $OCI_REGISTRY/$MODEL_NAME:${{ github.ref_name }} model.tar.gz --manifest-config metadata.json:application/vnd.acme.model.config.v1+json
canary-deploy:
needs: sign-and-publish
runs-on: ubuntu-latest
steps:
- name: Trigger Mender deployment (canary group)
env:
MENDER_TOKEN: ${{ secrets.MENDER_TOKEN }}
run: |
curl -s -X POST https://hosted.mender.io/api/management/v1/deployments \
-H "Authorization: Bearer $MENDER_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"artifact_name":"'"$IMAGE/$MODEL_NAME:${{ github.ref_name }}'","name":"canary-${{ github.ref_name }}","devices":"group:canary-pi5"}'
Canary rollout and automated rollback
Canaries for Pi fleets are device-group based (e.g., 5–10 physical Pi5 units). A safe canary pipeline includes:
- Start with a small device group labeled "canary"
- Deploy artifact and run hardware smoke tests via device management (sanity probe runs)
- Monitor latency, memory, inference error rate and system-level logs for OOM or kernel issues
- Promote to larger group gradually if metrics remain within threshold
- Automatic rollback if metrics breach thresholds
Example rollback automation (pseudo-script)
#!/bin/bash
CANARY_GROUP=canary-pi5
THRESHOLD_LATENCY_MS=600
THRESHOLD_ERROR_RATE=0.02
DEPLOYMENT_ID=$1
# Query Prometheus for avg latency last 10m
avg_latency=$(curl -s 'http://prometheus/api/v1/query?query=avg_over_time(infer_latency_ms{group="canary"}[10m])' | jq -r '.data.result[0].value[1]')
error_rate=$(curl -s 'http://prometheus/api/v1/query?query=avg_over_time(infer_errors_total{group="canary"}[10m])' | jq -r '.data.result[0].value[1]')
if (( $(echo "$avg_latency > $THRESHOLD_LATENCY_MS" | bc -l) )) || (( $(echo "$error_rate > $THRESHOLD_ERROR_RATE" | bc -l) )); then
echo "Threshold breached: initiating rollback"
# Trigger rollback via Mender API
curl -X POST -H "Authorization: Bearer $MENDER_TOKEN" -d '{"deployment_id":"'$DEPLOYMENT_ID'","force":true}' https://hosted.mender.io/api/management/v1/deployments/rollback
fi
Integrate this script in a scheduled GitHub Actions job or in the device manager's health-check hooks to enforce automated rollbacks.
Device-level safety: health probes and feature flags
- Run a local health-probe on Pi5 that checks process alive, memory headroom, and inference latency. The device should refuse activation of a new model if health checks fail.
- Use feature flags or model selectors so a device can run multiple models but only activate one at a time.
- Keep a lightweight supervisor (systemd or container supervisor) that can auto-restart the runtime and report state to the fleet manager.
Operational checklist before deployment
- Signed artifact present in OCI / object store.
- Metadata with declared memory + expected max tokens.
- Unit & emulated integration tests passed.
- Hardware smoke tests against a small Pi5 pool with AI HAT+ 2 passed.
- Canary group defined, monitoring & alerts wired to the pipeline.
- Rollback automation and image retention policy configured.
Advanced strategies & 2026 predictions
Expect the following trends through 2026:
- OCI model registries: standardization around ORAS & OCI model manifests for provenance and signatures.
- SLSA enforcement in CI: model provenance and attestations will be mandatory for regulated deployments.
- On-device runtime orchestration: edge containers with tiny orchestrators (k3s-lite or balena supervisors) will allow A/B runtime experiments.
- Hardware-aware quantization: quantization toolchains that embed AI HAT+ 2 acceleration parameters into metadata to guide runtime selection.
Adopting these now future-proofs your pipeline and reduces friction when new device NPUs or quant formats appear.
Real-world example (case study sketch)
Company X shipped an on-device summarization agent to a network of 500 Pi5 kiosks. They used this pipeline:
- Quantized model to INT8 using a custom ggml pipeline and packaged as OCI artifact.
- Built and signed the arm64 runtime with Docker buildx and cosign.
- Ran emulated latency tests in CI and hardware smoke tests on 10 Pi5 canaries with AI HAT+ 2.
- Rolled out to 10% of devices per day, with a Prometheus rule that automatically paused rollout if P95 latency increased >20%.
Result: stable rollout in 4 days and one automatic rollback during a nightly kernel update that revealed a driver mismatch — rollback prevented a fleet-wide failure.
Common pitfalls and how to avoid them
- Missing tokenizer: always bundle tokenizer to avoid subtle output mismatches.
- Undetected memory growth: include RSS and OOM probes; run long-running soak tests on at least one canary device.
- Unsigned artifacts: enforce cosign signature verification in device managers before activation.
- No device labels: tag devices with hardware/firmware versions to avoid bad rollouts across incompatible units.
Quick reference: commands & tools
- Buildx + QEMU: docker buildx build --platform linux/arm64 --push
- Model push (ORAS): oras push registry/model:tag model.tar.gz --manifest-config metadata.json
- Sign (cosign): cosign sign --key cosign.key registry/model:tag
- Device management: Mender / balenaCloud APIs for group deployments
- Monitoring: Prometheus + Grafana with SLO-based alerts for rollback
Actionable takeaways
- Design artifacts as self-contained OCI bundles with metadata and cosign signatures.
- Use Docker buildx + QEMU in CI to produce arm64 images without Pi hardware in the loop.
- Run emulated tests in CI plus minimal hardware smoke tests on a canary pool of Pi5s with AI HAT+ 2 before promotion.
- Automate canary rollouts tied to Prometheus SLOs and implement automatic rollback scripts.
- Enforce provenance (SLSA) and artifact signature verification on the device before activation.
Next steps & template repo
Take this article and map it to a Git repo with these components:
- ci/workflows/build-and-deploy.yml (GitHub Actions example above)
- scripts/pack-model.sh (creates model.tar.gz and metadata)
- scripts/health-probe.sh (device-level)
- deployment/mender-template.json (canary group deployment API template)
- tests/smoke-inference (emulated inference test harness)
Final thought (2026)
Edge LLMs on Pi5 with AI HAT+ 2 are production-capable if you treat the device like physical infrastructure: build reproducible artifacts, sign and attest provenance, run emulated+hardware tests, and adopt staged canaries with automated rollbacks. The pattern in this guide reflects what production teams adopted across late 2025 into 2026 — plug these building blocks into your pipeline and you’ll reduce deployment risk and accelerate delivery to the edge.
“In 2026, the differentiator in edge AI isn’t model size — it’s delivery discipline.”
Call to action: Clone the starter CI/CD template (build + test + canary) from our repo, adapt the GitHub Actions example to your registry, and run the emulated smoke tests today. Need help tailoring the pipeline to your fleet size or choosing a device manager? Contact our engineering team for a free audit of one Pi5 canary deployment plan.
Related Reading
- The ROI of Multi-Week Battery Wearables for Field Teams: Real-World Calculations
- Email Marketers vs. Gmail AI: A Tactical Playbook to Keep Open Rates High
- BBC x YouTube: The Landmark Deal That Could Change Public Broadcasting
- Commodity Risk Dashboard: Setting Alerts for Cotton, Corn, Wheat and Soy
- Mac mini M4 vs M4 Pro: Which Model Gives You the Best Value?
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Using Desktop Autonomous Agents (Anthropic Cowork) with Edge Devices: A Practical Integration Playbook
Deploying a Local LLM Cluster on Raspberry Pi 5: A Step-by-Step Guide
Benchmarks: How the $130 AI HAT+ 2 Transforms Raspberry Pi 5 for Local Generative AI
Case Study: Turning an Internal Dining Recommender into an Enterprise Micro App Platform
Preparing Your DNS for the Rise of Short-Lived Mobile AI Browsing Sessions
From Our Network
Trending stories across our publication group