Matchbox Server Configuration for Custom Talos Images
Matchbox Server Configuration for Custom Talos Images
Overview
This document describes how the AWS bastion’s matchbox server will serve custom ARM64 Talos images for netbooting. The configuration integrates with the CozyStack ARM64 images built via GitHub Actions and stored in GHCR.
Architecture
Talos Node Boot Process:
1. t4g.small instance launches (AWS EC2)
2. PXE boot → DHCP request (dnsmasq on bastion)
3. DHCP response → next-server: 10.20.13.140 (matchbox)
4. TFTP/HTTP → matchbox serves ARM64 kernel/initramfs
5. Talos boots with Spin + Tailscale extensions
6. Node joins CozyStack cluster
Bastion Matchbox Setup
Directory Structure
/opt/matchbox/
├── assets/
│ └── talos/
│ └── arm64/ # ← Custom ARM64 assets from GHCR
│ ├── vmlinuz # ARM64 kernel
│ ├── initramfs.xz # ARM64 initramfs with extensions
│ ├── vmlinuz.sha256
│ └── initramfs.xz.sha256
├── profiles/
│ └── cozystack-arm64.json # Talos boot profile
├── groups/
│ └── default.json # Machine group assignment
└── ignition/
└── cozystack-config.yaml # Talos machine config
Asset Extraction from GHCR
The bastion pulls custom Talos assets from the GitHub Container Registry during startup:
#!/bin/bash
# /opt/bastion-setup/extract-talos-assets.sh
set -e
# Pull the latest custom Talos image
echo "Pulling custom Talos ARM64 image from GHCR..."
docker pull ghcr.io/urmanac/talos-cozystack-demo:demo-stable
# Extract boot assets to matchbox
echo "Extracting ARM64 boot assets..."
mkdir -p /opt/matchbox/assets/talos/arm64
docker run --rm \
-v /opt/matchbox/assets:/output \
ghcr.io/urmanac/talos-cozystack-demo:demo-stable \
/output/talos/arm64/
# Verify assets extracted correctly
echo "Verifying ARM64 assets..."
ls -la /opt/matchbox/assets/talos/arm64/
sha256sum -c /opt/matchbox/assets/talos/arm64/*.sha256
# Set appropriate permissions
chown -R matchbox:matchbox /opt/matchbox/assets/
chmod -R 644 /opt/matchbox/assets/talos/arm64/*
echo "✅ Custom ARM64 Talos assets ready for netboot"
Matchbox Profile Configuration
{
"id": "cozystack-arm64",
"name": "CozyStack ARM64 with Spin + Tailscale",
"boot": {
"kernel": "/assets/talos/arm64/vmlinuz",
"initrd": ["/assets/talos/arm64/initramfs.xz"],
"args": [
"init_on_alloc=1",
"init_on_free=1",
"slub_debug=P",
"pti=on",
"console=tty0",
"console=ttyS0",
"printk.devkmsg=on",
"talos.platform=metal",
"talos.config=http://10.20.13.140:8080/ignition?uuid=${uuid}"
]
}
}
Machine Group Assignment
{
"id": "default",
"name": "Default ARM64 CozyStack Nodes",
"profile": "cozystack-arm64",
"selector": {
"arch": "arm64"
},
"metadata": {
"cozystack_cluster": "demo",
"extensions_enabled": ["spin", "tailscale"]
}
}
Talos Machine Configuration
# /opt/matchbox/ignition/cozystack-config.yaml
version: v1alpha1
debug: false
persist: true
machine:
type: worker # or controlplane for first node
token: <cluster-join-token>
ca:
crt: <cluster-ca-certificate>
certSANs:
- 10.20.13.140
- cluster.local
kubelet:
image: ghcr.io/siderolabs/kubelet:v1.31.0
defaultRuntimeSeccompProfileEnabled: true
registerWithTaints:
- node.cozystack.io/arm64:NoSchedule
network:
hostname: talos-demo-${uuid}
interfaces:
- interface: eth0
dhcp: true
vlans:
- vlanId: 13
dhcp: true
install:
disk: /dev/nvme0n1 # Typical for t4g instances
image: ghcr.io/urmanac/talos-cozystack-demo:demo-stable
wipe: false
cluster:
id: <cluster-id>
secret: <cluster-secret>
controlPlane:
endpoint: https://10.20.13.100:6443 # First node IP
clusterName: cozystack-demo
network:
dnsDomain: cluster.local
podSubnets:
- 10.244.0.0/16
serviceSubnets:
- 10.96.0.0/12
token: <bootstrap-token>
ca:
crt: <cluster-ca-cert>
key: <cluster-ca-key>
Docker Compose Integration
The bastion runs matchbox as part of its Docker infrastructure:
# /opt/bastion-setup/docker-compose.yml (excerpt)
services:
matchbox:
image: quay.io/poseidon/matchbox:v0.11.0
container_name: matchbox
restart: unless-stopped
ports:
- "8080:8080" # HTTP API
- "8081:8081" # gRPC API
volumes:
- /opt/matchbox:/var/lib/matchbox:Z
- /opt/matchbox/assets:/var/lib/matchbox/assets:Z
environment:
MATCHBOX_ADDRESS: "0.0.0.0:8080"
MATCHBOX_RPC_ADDRESS: "0.0.0.0:8081"
MATCHBOX_LOG_LEVEL: "debug"
depends_on:
- dnsmasq
networks:
netboot_net:
ipv4_address: 10.20.13.140
dnsmasq:
image: dnsmasq/dnsmasq:latest
container_name: dnsmasq
restart: unless-stopped
ports:
- "67:67/udp" # DHCP
- "69:69/udp" # TFTP
volumes:
- /opt/bastion-setup/dnsmasq.conf:/etc/dnsmasq.conf:ro
cap_add:
- NET_ADMIN
networks:
netboot_net:
ipv4_address: 10.20.13.140
networks:
netboot_net:
driver: bridge
ipam:
config:
- subnet: 10.20.13.0/24
dnsmasq Configuration
# /opt/bastion-setup/dnsmasq.conf
# DHCP Configuration
interface=eth0
bind-interfaces
dhcp-range=10.20.13.150,10.20.13.200,24h
dhcp-option=option:router,10.20.13.1
dhcp-option=option:dns-server,10.20.13.140
# PXE Boot Configuration
enable-tftp
tftp-root=/var/lib/matchbox/assets
dhcp-userclass=set:ipxe,iPXE
dhcp-boot=tag:#ipxe,undionly.kpxe
dhcp-boot=tag:ipxe,http://10.20.13.140:8080/boot.ipxe
# ARM64 specific settings
dhcp-match=set:efibc,option:client-arch,7 # EFI BC (ARM64)
dhcp-boot=tag:efibc,tag:!ipxe,bootaa64.efi
dhcp-boot=tag:efibc,tag:ipxe,http://10.20.13.140:8080/boot.ipxe
# Logging
log-dhcp
log-queries
Flux ExternalArtifact Integration (Future)
For advanced GitOps workflows with Flux 2.7+, we can reference the custom Talos images as external artifacts:
# flux-external-artifacts/talos-custom-images.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: ExternalSource
metadata:
name: talos-cozystack-custom
namespace: flux-system
spec:
url: oci://ghcr.io/urmanac/talos-cozystack-demo
tag: demo-stable
layer:
mediaType: application/vnd.oci.image.layer.v1.tar
# Extract specific assets for different use cases
interval: 1h
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: talos-image-updates
namespace: flux-system
spec:
interval: 1h
sourceRef:
kind: ExternalSource
name: talos-cozystack-custom
path: "./assets"
prune: true
targetNamespace: cozy-system
postBuild:
substitute:
TALOS_VERSION: "${TALOS_VERSION}"
KERNEL_DIGEST: "${KERNEL_DIGEST}"
INITRAMFS_DIGEST: "${INITRAMFS_DIGEST}"
Validation & Testing
Test 1: Asset Extraction
#!/bin/bash
# tests/matchbox/01-asset-extraction.sh
test_ghcr_image_pullable() {
docker pull ghcr.io/urmanac/talos-cozystack-demo:demo-stable
}
test_assets_extract_correctly() {
mkdir -p /tmp/test-assets
docker run --rm -v /tmp/test-assets:/output \
ghcr.io/urmanac/talos-cozystack-demo:demo-stable \
/output/talos/arm64/
test -f /tmp/test-assets/talos/arm64/vmlinuz
test -f /tmp/test-assets/talos/arm64/initramfs.xz
sha256sum -c /tmp/test-assets/talos/arm64/*.sha256
}
Test 2: Matchbox HTTP API
#!/bin/bash
# tests/matchbox/02-matchbox-api.sh
test_matchbox_serves_assets() {
# Test that matchbox HTTP API serves ARM64 assets
curl -f http://10.20.13.140:8080/assets/talos/arm64/vmlinuz > /dev/null
curl -f http://10.20.13.140:8080/assets/talos/arm64/initramfs.xz > /dev/null
}
test_matchbox_boot_profile() {
# Test that matchbox returns correct boot profile
PROFILE=$(curl -s http://10.20.13.140:8080/profiles/cozystack-arm64)
echo "$PROFILE" | jq -e '.boot.kernel == "/assets/talos/arm64/vmlinuz"'
echo "$PROFILE" | jq -e '.boot.initrd[0] == "/assets/talos/arm64/initramfs.xz"'
}
Test 3: End-to-End Netboot
#!/bin/bash
# tests/matchbox/03-netboot-e2e.sh
test_talos_node_netboots() {
# Launch new t4g.small instance, wait for netboot
# This requires actual AWS integration
echo "Manual test: Launch t4g.small instance, verify netboot"
echo "Expected: Node appears in 'talosctl get members' within 5 minutes"
}
Operational Procedures
Updating Custom Images
- Update patches in
/patches/directory - Trigger GitHub Actions build via commit or manual dispatch
- Bastion auto-updates on next startup (or manual trigger):
ssh ubuntu@10.20.13.140 'sudo /opt/bastion-setup/extract-talos-assets.sh' - Verify update:
curl -s http://10.20.13.140:8080/assets/talos/arm64/vmlinuz.sha256
Troubleshooting
Problem: Talos nodes not netbooting
- Check: dnsmasq DHCP logs:
docker logs dnsmasq - Check: matchbox asset serving:
curl http://10.20.13.140:8080/assets/talos/arm64/vmlinuz - Check: EC2 instance launch: PXE boot enabled, correct subnet
Problem: Extensions not loading
- Check: Custom image built correctly: GitHub Actions logs
- Check: Patches applied:
talosctl -n <node> get extensions - Check: ARM64 compatibility: Extension sources
Problem: High AWS costs
- Monitor: EC2 instances running longer than expected
- Alert: Cost exceeding $0.10/month threshold
- Action: Terminate all non-bastion instances
Cost Optimization
Free Tier Management
- GHCR pulls: Free for public repos (no egress cost)
- Asset caching: Store assets locally on bastion (EBS cost only)
- Bandwidth: Private VPC networking (no data transfer costs)
- Compute: Netboot process uses minimal bastion CPU
Future Optimizations
- Lambda matchbox: Replace Docker container with serverless
- EFS storage: Share assets across multiple bastions
- CloudFront: Cache assets globally (if needed for multi-region)
Next Documents:
AWS-INFRASTRUCTURE-HANDOFF.md- Instructions for AWS-capable Claude agentDEMO-SCRIPT.md- Step-by-step presentation workflow