Self-contained example under examples/ with full NixOS flakes for both sides of a deployment (control server + binary cache vs. an agent device), plus a README walking through the end-to-end install + first publish.
5.7 KiB
nix-ota
Open-source OTA updates for fleets of NixOS devices. A self-hostable control server + lightweight device agent that ship prebuilt system closures from a binary cache to devices that don't have your flake.
Think Cachix Deploy, but you run it.
Architecture
┌─────────┐ 1. nix build + nix copy ┌──────────┐
│ CI / │ ─────────────────────────► │ Binary │ (Attic / S3 / nix-serve / Cachix)
│ Builder │ │ Cache │
└────┬────┘ └────▲─────┘
│ 2. publish signed manifest │
▼ │ 4. nix copy --from <cache>
┌─────────────┐ 3. GET current ┌────────┴─────┐
│ Control │ ◄──────────────── │ Device │
│ Server + UI │ 5. POST checkin │ Agent │ ──► switch-to-configuration
└─────────────┘ └──────────────┘
The control server never holds the signing key. Operators (or CI) sign manifests with an offline ed25519 key and POST them; devices verify against a pinned public key. A server compromise cannot push arbitrary closures.
Components
| Crate | Binary | Role |
|---|---|---|
crates/server |
nix-ota-server |
REST API + SQLite + HTMX dashboard |
crates/agent |
nix-ota-agent |
Polls, verifies, applies, rolls back |
crates/publisher |
nix-ota |
Operator/CI CLI (keygen + publish) |
crates/common |
(lib) | Manifest types + ed25519 |
Quickstart (< 10 minutes)
👉 For a complete copy-pasteable setup with two real NixOS flakes (server host + device host), see
examples/.
1. Generate a signing key on your workstation
nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- keygen --out ./sign.key
# prints the public key — save it, you'll bake it into every device.
2. Deploy the server
# configuration.nix
{
imports = [ nix-ota.nixosModules.server ];
services.nix-ota-server = {
enable = true;
openFirewall = true;
publishTokenFile = "/run/secrets/nix-ota-publish-token";
};
}
3. Install the agent on a device
{
imports = [ nix-ota.nixosModules.agent ];
services.nix-ota-agent = {
enable = true;
server = "https://ota.example.com";
channel = "prod";
deviceId = "fridge-007";
publicKey = "<base64 ed25519 pubkey from step 1>";
cacheUrl = "https://cache.example.com";
cachePublicKey = "cache.example.com:abc...=";
healthCmd = "systemctl is-system-running --wait"; # optional
};
}
4. Publish your first update
nix build .#nixosConfigurations.fridge-007.config.system.build.toplevel
nix copy --to s3://my-cache ./result
nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- publish \
--server https://ota.example.com \
--token $(cat publish-token) \
--key ./sign.key \
--channel prod \
--store-path $(readlink -f result) \
--substituter https://cache.example.com
Open https://ota.example.com/ to watch the fleet pick it up.
How updates apply
On each poll the agent:
- Fetches
/channels/<name>/current. - Verifies the ed25519 signature against the pinned key.
- Rejects manifests with a revision ≤ the last one applied (replay defense).
nix copy --from <substituter> <storePath>— Nix verifies cache signatures on every store path.nix-env -p /nix/var/nix/profiles/system --set <storePath><storePath>/bin/switch-to-configuration switch- Runs the optional
healthCmd. On failure: switches back to the previous generation and reportsrolled_back.
Threat model
| Threat | Mitigation |
|---|---|
| Compromised control server pushes evil | Manifests must be signed by offline ed25519 key pinned on every device. |
| Compromised cache serves wrong closure | Nix verifies per-path signatures against trusted-public-keys. |
| Replay of an older (vulnerable) closure | Manifest carries monotonic revision; agent persists & rejects rollbacks. |
| Random internet caller publishes | POST /channels/:name/publish requires bearer token. |
| Random caller reads fleet state | UI/API should be put behind your reverse proxy / SSO. (v1: no built-in auth on reads.) |
| Bad closure bricks device | Health-check + magic-rollback to previous system generation. |
Key management: keep sign.key offline (hardware token, ops laptop,
or a sealed CI secret). The server never sees it. Rotating: generate a
new key, update publicKey on devices in a closure published with the
old key, then start signing with the new one.
Non-goals (v1)
- The server does no Nix evaluation or building — CI does that.
- No replacement for your binary cache — use Attic, Cachix, S3, nix-serve.
- No per-device secrets (use sops-nix / agenix inside the closure).
- No web-based config editing — config lives in your flake repo.
Development
nix develop
cargo build --workspace
cargo test --workspace
nix flake check # runs the full NixOS VM test
License
MIT OR Apache-2.0.