ADR-005: Sovereign OS Factory for Hardware Extension Integration
ADR-005: Sovereign OS Factory for Hardware Extension Integration
Date: 2026-06-18
Status: Proposed
Context: Integration of Hailo-10H AI accelerator drivers (HailoRT v5.3.0) into Talos Linux for Raspberry Pi 5.
Summary
This ADR establishes a new CI architecture: the “Sovereign OS Factory.” This architecture allows us to build a custom Talos Linux kernel and installer image alongside our hardware extensions (like HailoRT 5.3.0), ensuring cryptographic signature alignment and resolving the “key was rejected by service” error. It also expands our build matrix to dynamically switch between standard upstream Talos artifacts and our custom-built sovereign artifacts based on hardware targets.
Problem
- Strict Module Signing: Talos Linux kernels employ strong security measures, including ephemeral module signing keys generated during the kernel build. Any kernel module (like
hailo1x_pci) not signed with the exact key used to compile the running kernel will be rejected, leading to hardware not being initialized or even boot failures. - Hailo 10H Driver Gap: Upstream Sidero Labs Talos does not provide official
HailoRT v5.3.0drivers, which are necessary for Hailo-10H hardware. We cannot rely on their pre-signed kernel modules. - Out-of-Band Build Issues: Our previous approach of building the HailoRT extension in isolation (
hack/build-hailort.sh) resulted in the driver being signed with a different key than the kernel, leading to rejection. - CI Build Matrix Limitations: The existing CI workflow treated all Talos images as simple overlays on a single upstream base, lacking the granularity to differentiate hardware-specific kernel requirements.
- GitHub Hosted Runner Resource Limits: Compiling a full Linux kernel and related Talos images requires significant disk space and memory, exceeding the capacity of standard GitHub-hosted runners, resulting in “No space left on device” errors.
Decision
✅ CHOSEN: Implement a Two-Tiered CI Architecture with a Sovereign OS Factory and Self-Hosted Runners
We will redefine our GitHub Actions CI pipeline to consist of two primary stages, with the resource-intensive “Sovereign OS Factory” leveraging self-hosted runners.
Tier 1: The “Sovereign OS Factory” (build-sovereign-os job)
This new job (implemented in hack/build-sovereign-os.sh) will now run on self-hosted runners and will:
- Download Sidero Sources: Fetch pinned
siderolabs/pkgs,siderolabs/extensions, andsiderolabs/talossource repositories. - Compile Custom Kernel: Build the
kernelpackage fromsiderolabs/pkgslocally. This step generates our unique, ephemeral cryptographic signing key for the kernel. - Compile Custom Extension: Build the
hailortextension fromsiderolabs/extensionsusing thekernel-buildstage produced in step 2. This ensures the extension is signed with the exact same key as our custom kernel. - Compile Custom Installer: Build the
installer-baseandinstallerimages fromsiderolabs/talos, overriding thePKG_KERNELvariable to point to our custom-built kernel. This wraps our sovereign kernel in a standard Talos installer image. - Publish Sovereign Artifacts: Push the custom
urmanac/installer:<unique-hash>andurmanac/hailort:<unique-hash>images to GHCR. - Idempotency & Tagging: Utilize content-based hashing (
UNIQUE_TAG) to skip rebuilding if nothing has changed. Onmainbranch pushes, update stable tags (5.3.0-v1.13.3,5.3.0) to point to these verified artifacts.
Tier 2: The “Assembly Matrix” (build-cozystack-upstream job)
This job will be modified to expand its build matrix and dynamically inject artifacts:
- Expanded Matrix: Introduce a new
hardwaredimension (e.g.,[cm4-standard, cm5-hailo10h]) alongside the existingextension_variant. - Dynamic Artifact Injection:
cm4-standard(Default/Fast Path): Uses the standardghcr.io/siderolabs/installerandghcr.io/siderolabs/hailort(or other upstream extensions) for standard CM4 nodes. These builds will remain fast.cm5-hailo10h(Exotic/Heavy Path): Intercepts thegen-profiles.shprocess. It injects theINSTALLER_IMAGEandHAILORT_IMAGEenvironment variables from the outputs of thebuild-sovereign-osjob. This forces the Talos image assembly to use our custom kernel and signed HailoRT driver.
Alternatives Considered
- Attempt to share Sidero Labs’
kernel-build: Explored pointing ourbldrbuilds toghcr.io/siderolabs/kernel-build. Rejected because Sidero Labs’kernel-buildimages are private, making it impossible to align signing keys without their internal build infrastructure. - Disable Module Signing (Unfeasible): Talos Linux is designed around immutability and security. Disabling kernel module signing would compromise the integrity of the OS, is not officially supported, and would introduce significant security risks.
- Waiting for Upstream HailoRT 5.x Support: While ideal, the timeline for upstream Sidero Labs to integrate HailoRT v5.3.0 (or newer) is uncertain. Our project requires immediate support for Hailo-10H.
- Optimizing GitHub Hosted Runners: Attempted to reduce disk usage on hosted runners. Rejected as kernel compilation is inherently disk/memory intensive and consistently exceeds free-tier limits.
Consequences
Positive:
- ✅ Resolves “Key Rejected” Error: Ensures cryptographic signature alignment between the kernel and the
hailo1x_pcimodule. - ✅ Robust Hardware Support: Provides a reliable method for integrating custom hardware drivers that are not upstream-supported by Talos.
- ✅ Flexible Build Matrix: Allows for differentiated builds and testing across various hardware/extension combinations.
- ✅ Maintainable Idempotency: The Sovereign OS Factory leverages content-hashing to ensure fast, repeatable builds.
- ✅ Persistent Build Caching: Self-hosted runners enable persistent Buildx caching, drastically reducing subsequent build times for the Sovereign OS Factory.
- ✅ Extensibility: The factory can be expanded to build other custom kernel modules or even custom
spinortailscalevariants if needed in the future.
Negative:
- ⚠️ Requires Self-Hosted Runners: The
build-sovereign-osjob now requires a dedicated self-hosted runner with sufficient resources (disk, memory, ARM64 architecture) due to the demands of full kernel compilation. - ⚠️ Increased Complexity: Introduces a more sophisticated CI/CD pipeline, requiring careful management of
bldrcommands across multiple Sidero repositories. - ⚠️ Maintenance Overhead: We are now responsible for maintaining our custom kernel build process, including pinning Sidero
pkgsandtalosto specific versions, and managing the self-hosted runner infrastructure.
Neutral:
- 🔄 Expanded Registry: We will be pushing custom
urmanac/installerandurmanac/kernelimages to GHCR alongside our existing extension images.
Next ADR: ADR-006: Kubernetes OCI Image Management and Multi-Architecture Builds
| 📍 Navigation: Home | Documentation Index |