nix-ota/README.md
0m.ax f72d24596a Add worked example: server-host and device-host flakes
Self-contained example under examples/ with full NixOS flakes for both
sides of a deployment (control server + binary cache vs. an agent
device), plus a README walking through the end-to-end install + first
publish.
2026-05-25 15:57:32 +02:00

5.7 KiB

nix-ota

Open-source OTA updates for fleets of NixOS devices. A self-hostable control server + lightweight device agent that ship prebuilt system closures from a binary cache to devices that don't have your flake.

Think Cachix Deploy, but you run it.

Architecture

┌─────────┐  1. nix build + nix copy   ┌──────────┐
│  CI /   │ ─────────────────────────► │  Binary  │  (Attic / S3 / nix-serve / Cachix)
│ Builder │                            │   Cache  │
└────┬────┘                            └────▲─────┘
     │ 2. publish signed manifest          │
     ▼                                      │ 4. nix copy --from <cache>
┌─────────────┐  3. GET current   ┌────────┴─────┐
│  Control    │ ◄──────────────── │   Device     │
│ Server + UI │  5. POST checkin  │   Agent      │ ──► switch-to-configuration
└─────────────┘                   └──────────────┘

The control server never holds the signing key. Operators (or CI) sign manifests with an offline ed25519 key and POST them; devices verify against a pinned public key. A server compromise cannot push arbitrary closures.

Components

Crate Binary Role
crates/server nix-ota-server REST API + SQLite + HTMX dashboard
crates/agent nix-ota-agent Polls, verifies, applies, rolls back
crates/publisher nix-ota Operator/CI CLI (keygen + publish)
crates/common (lib) Manifest types + ed25519

Quickstart (< 10 minutes)

👉 For a complete copy-pasteable setup with two real NixOS flakes (server host + device host), see examples/.

1. Generate a signing key on your workstation

nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- keygen --out ./sign.key
# prints the public key — save it, you'll bake it into every device.

2. Deploy the server

# configuration.nix
{
  imports = [ nix-ota.nixosModules.server ];
  services.nix-ota-server = {
    enable = true;
    openFirewall = true;
    publishTokenFile = "/run/secrets/nix-ota-publish-token";
  };
}

3. Install the agent on a device

{
  imports = [ nix-ota.nixosModules.agent ];
  services.nix-ota-agent = {
    enable = true;
    server         = "https://ota.example.com";
    channel        = "prod";
    deviceId       = "fridge-007";
    publicKey      = "<base64 ed25519 pubkey from step 1>";
    cacheUrl       = "https://cache.example.com";
    cachePublicKey = "cache.example.com:abc...=";
    healthCmd      = "systemctl is-system-running --wait";  # optional
  };
}

4. Publish your first update

nix build .#nixosConfigurations.fridge-007.config.system.build.toplevel
nix copy --to s3://my-cache ./result

nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- publish \
  --server https://ota.example.com \
  --token  $(cat publish-token) \
  --key    ./sign.key \
  --channel prod \
  --store-path $(readlink -f result) \
  --substituter https://cache.example.com

Open https://ota.example.com/ to watch the fleet pick it up.

How updates apply

On each poll the agent:

  1. Fetches /channels/<name>/current.
  2. Verifies the ed25519 signature against the pinned key.
  3. Rejects manifests with a revision ≤ the last one applied (replay defense).
  4. nix copy --from <substituter> <storePath> — Nix verifies cache signatures on every store path.
  5. nix-env -p /nix/var/nix/profiles/system --set <storePath>
  6. <storePath>/bin/switch-to-configuration switch
  7. Runs the optional healthCmd. On failure: switches back to the previous generation and reports rolled_back.

Threat model

Threat Mitigation
Compromised control server pushes evil Manifests must be signed by offline ed25519 key pinned on every device.
Compromised cache serves wrong closure Nix verifies per-path signatures against trusted-public-keys.
Replay of an older (vulnerable) closure Manifest carries monotonic revision; agent persists & rejects rollbacks.
Random internet caller publishes POST /channels/:name/publish requires bearer token.
Random caller reads fleet state UI/API should be put behind your reverse proxy / SSO. (v1: no built-in auth on reads.)
Bad closure bricks device Health-check + magic-rollback to previous system generation.

Key management: keep sign.key offline (hardware token, ops laptop, or a sealed CI secret). The server never sees it. Rotating: generate a new key, update publicKey on devices in a closure published with the old key, then start signing with the new one.

Non-goals (v1)

  • The server does no Nix evaluation or building — CI does that.
  • No replacement for your binary cache — use Attic, Cachix, S3, nix-serve.
  • No per-device secrets (use sops-nix / agenix inside the closure).
  • No web-based config editing — config lives in your flake repo.

Development

nix develop
cargo build --workspace
cargo test --workspace
nix flake check    # runs the full NixOS VM test

License

MIT OR Apache-2.0.