Custom Talos Images for CozyStack Demo
Custom Talos Images for CozyStack Demo
Overview
This document specifies the custom ARM64 Talos Linux images needed for the “Home Lab to the Moon and Back” demo. These images will be built using CozyStack’s existing Makefile-based build system via hephy-builder pattern and stored in GHCR (GitHub Container Registry) for free.
Based on proven approach: kingdonb/cozystack:merged-branches - 12 commits ahead with working Spin + Tailscale builds.
Build Strategy: Proven CozyStack Fork Pattern
Existing Pattern (from your merged-branches):
kingdonb/cozystack fork → Manual builds → Docker Hub
├── cozystack-spin-only (Spin runtime only)
├── cozystack-spin-tailscale (Spin + Tailscale)
└── Custom matchbox image (netboot configuration)
New CI Pattern (this project):
urmanac/cozystack-moon-and-back → GitHub Actions → GHCR
├── Reference kingdonb/cozystack commits c112ec63 → c21b1a86
├── Automate the proven manual build process
└── Add ARM64 architecture support
Target Registry: ghcr.io/urmanac/talos-cozystack-demo
Architecture: ARM64 only (matches t4g instances and future Raspberry Pi CM3)
Versioning: Mirror upstream CozyStack versions exactly (no custom versions)
Proven Implementation Analysis
From kingdonb/cozystack:merged-branches - commit sequence c112ec63 → c21b1a86:
Commit c112ec63: publish (matchbox) to kingdonb on docker.io
- What: Initial matchbox image publication to Docker Hub
- Pattern: Manual build process established
- Registry:
kingdonb/namespace on Docker Hub
Commit 24d566b: hardcode spin + tailscale versions
- What: Pin specific extension versions for reproducibility
- Extensions: Spin runtime + Tailscale pinned to working versions
- Approach: Hardcoded rather than dynamic version fetching
Commit 06a28cc: hack/gen-profiles.sh
- What: Modified profile generator for custom extensions
- Changes: Added Spin and Tailscale to extensions list
- Pattern: Patch CozyStack’s existing build system
Commit c581c59: -tailscale then e9c41ce: Revert "-tailscale"
- What: Testing different extension combinations
- Variants: spin-only vs. spin-tailscale builds
- Strategy: Multiple image variants for different use cases
Commit ce8b1bb: REGISTRY := kingdonb
- What: Registry destination configuration
- Pattern: Simple variable override in Makefile
- Target: Docker Hub personal registry
Commit 3d8e781: build complete
- What: Successful manual build validation
- Assets: All custom Talos images built successfully
- Quality: Proven working state
Commit 1c8e759: matchbox.tag
- What: Matchbox image tagging/versioning
- Integration: Matchbox configured for custom Talos images
- Pattern: Separate matchbox build after Talos assets
Commit c21b1a86: publish matchbox image
- What: Final step - publish matchbox with custom configuration
- Complete: End-to-end custom netboot infrastructure
- Status: Production-ready manual process
Our Automation Strategy
Convert proven manual process → GitHub Actions automation:
- Clone kingdonb/cozystack at working commit
- Apply ARM64 patches on top of proven x86_64 changes
- Update registry target from
kingdonb→ghcr.io/urmanac - Automate build sequence following exact commit pattern
- Generate both variants: spin-only and spin-tailscale
- Build custom matchbox with ARM64 asset configuration
- Version mirroring to match upstream CozyStack releases
Hephy-Builder Integration
Repository Pattern:
urmanac/cozystack-moon-and-back
├── .github/workflows/build-talos-images.yml
├── patches/
│ ├── 01-arm64-architecture.patch
│ ├── 02-add-spin-extension.patch
│ └── 03-add-tailscale-extension.patch
└── scripts/
└── build-cozystack-images.sh
Build Process:
- Clone:
git clone https://github.com/cozystack/cozystack.git - Patch: Apply patches for ARM64 + Spin + Tailscale
- Build Dependencies: Ensure
docker,skopeo,jq,yqavailable - Generate Profiles: Run patched
hack/gen-profiles.shfor ARM64 - Build Images: Run
make talos-metal talos-kernel talos-initramfs - Extract Assets: Get kernel/initramfs for matchbox server
- Push: Upload to GHCR with proper tags
GitHub Actions Environment:
- ✅
runs-on: ubuntu-latesthas Docker + build tools - ✅ QEMU available for ARM64 emulation via
docker/setup-qemu-action - ✅ Can run
makecommands directly (no kaniko needed) - ✅ GHCR authentication via
GITHUB_TOKEN
Integration Points
Matchbox Server Configuration
The bastion’s matchbox server will serve these images:
# /opt/matchbox/assets/talos/
kernel: /assets/talos/custom/vmlinuz-arm64
initramfs: /assets/talos/custom/initramfs-arm64.xz
Image Pull Strategy:
- GitHub Actions builds → GHCR
- Bastion pulls image on startup via Docker
- Extract kernel/initramfs from OCI image to matchbox assets
- Serve via HTTP to netbooting nodes
Flux ExternalArtifact (Future)
If using Flux 2.7+ ExternalArtifact features:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: ExternalArtifact
metadata:
name: talos-cozystack-custom
spec:
url: oci://ghcr.io/urmanac/talos-cozystack-demo
tag: demo-stable
layer:
mediaType: application/vnd.oci.image.layer.v1.tar
digest: sha256:...
Test-Driven Validation
Test 1: Image Builds Successfully
#!/bin/bash
# tests/custom-image/01-build-success.sh
test_github_actions_build_succeeds() {
# GIVEN: GitHub Actions workflow triggered
# WHEN: Building custom Talos image
# THEN: Build completes without errors AND image pushed to GHCR
# This will be validated via GitHub Actions status
# Manual verification: check Actions tab for green checkmarks
echo "Check: https://github.com/urmanac/cozystack-moon-and-back/actions"
true
}
test_image_pullable_from_ghcr() {
# GIVEN: Image built and pushed
# WHEN: Pulling from GHCR
# THEN: Pull succeeds without authentication (public repo)
docker pull ghcr.io/urmanac/talos-cozystack-demo:demo-stable
}
Test 2: Extensions Present
#!/bin/bash
# tests/custom-image/02-extensions-present.sh
test_spin_runtime_available() {
# GIVEN: Custom Talos image running
# WHEN: Checking for Spin extension
# THEN: Spin binary exists and RuntimeClass created
# Run on actual node:
# talosctl -n 10.20.13.x get runtimeclass spin
echo "Manual validation required on running node"
}
test_tailscale_extension_available() {
# GIVEN: Custom Talos image running
# WHEN: Checking for Tailscale extension
# THEN: Tailscale service available
# talosctl -n 10.20.13.x get services | grep tailscale
echo "Manual validation required on running node"
}
Test 3: Netboot Integration
#!/bin/bash
# tests/custom-image/03-netboot-integration.sh
test_matchbox_serves_custom_kernel() {
# GIVEN: Bastion with custom image assets
# WHEN: Matchbox serves boot files
# THEN: Custom kernel/initramfs served successfully
curl -s http://10.20.13.140:8080/assets/talos/custom/vmlinuz-arm64 > /dev/null
curl -s http://10.20.13.140:8080/assets/talos/custom/initramfs-arm64.xz > /dev/null
}
test_node_boots_with_extensions() {
# GIVEN: Node netboots custom image
# WHEN: Talos starts up
# THEN: Extensions are loaded and functional
# This is end-to-end validation - requires actual AWS test
echo "End-to-end test: deploy node, check extension availability"
}
Build Workflow Structure
.github/workflows/build-talos-images.yml
├── Trigger: Push to main, PR to main, manual dispatch
├── Job: build-custom-talos-arm64
│ ├── Setup: checkout, Docker buildx, QEMU
│ ├── Login: GHCR with GITHUB_TOKEN
│ ├── Build: Custom Dockerfile with extensions
│ ├── Test: Basic smoke tests on image
│ ├── Push: Multiple tags to GHCR
│ └── Outputs: Image digest, tags pushed
└── Job: update-matchbox-assets (future)
├── Extract: kernel/initramfs from OCI image
├── Upload: To S3 or artifact storage
└── Notify: Update bastion configuration
Next Actions
- Identify Spin Extension Source:
- Check Talos documentation for official Spin extension
- If none exists, plan custom build process
- Research SpinKube requirements for runtime integration
- Identify Tailscale Extension Source:
- Check Talos extensions catalog
- Verify ARM64 compatibility
- Plan authentication strategy (auth keys, etc.)
- Create Initial Dockerfile:
- Base FROM ghcr.io/siderolabs/talos:v1.9.0
- Add extension installation steps
- Optimize for GitHub Actions build time
- Setup GitHub Actions Workflow:
- Configure GHCR authentication
- Test ARM64 builds via QEMU
- Add build caching for faster iterations
Questions for AWS-Capable Claude Agent
When ready to hand off infrastructure planning:
- Bastion Image Extraction: How should bastion pull OCI images and extract kernel/initramfs for matchbox?
- AWS Cost Implications: Any costs for GHCR pulls from EC2 instances?
- Networking: How will bastion pull images during startup? (NAT gateway egress)
- Security: IAM roles needed for ECR/GHCR access, or just public pulls?
Success Criteria
- GitHub Actions builds custom ARM64 Talos images successfully
- Images available via
docker pull ghcr.io/urmanac/talos-cozystack-demo:demo-stable - Bastion can extract and serve kernel/initramfs from OCI images
- Nodes boot custom images and extensions are functional
- Zero additional cost beyond base AWS infrastructure
- Process documented for community replication
Next Document: GitHub Actions workflow configuration
Status: Ready for AWS infrastructure planning handoff