Initial nix-ota implementation
Self-hostable OTA update system for NixOS fleets: a control server,
device agent, publisher CLI, and NixOS modules that ship prebuilt
system closures from a binary cache to devices that don't have the
flake.
- crates/common: signed manifest types (ed25519), store-path validator
- crates/server: axum + sqlite + HTMX dashboard, channel/device API
- crates/agent: poll, verify signature + revision, nix copy, switch,
health check, magic-rollback on failure
- crates/publisher: keygen + sign + publish CLI for operators/CI
- nix/modules: NixOS modules for server and agent
- nix/tests/ota.nix: end-to-end VM test exercising publish A -> B ->
broken C -> rollback to B (passes)
The control server never holds the signing key; manifests are signed
offline and verified against a pinned public key on each device.
2026-05-25 14:58:42 +02:00
|
|
|
# nix-ota
|
|
|
|
|
|
|
|
|
|
Open-source OTA updates for fleets of NixOS devices. A self-hostable
|
|
|
|
|
control server + lightweight device agent that ship prebuilt system
|
|
|
|
|
closures from a binary cache to devices that don't have your flake.
|
|
|
|
|
|
|
|
|
|
Think Cachix Deploy, but you run it.
|
|
|
|
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
┌─────────┐ 1. nix build + nix copy ┌──────────┐
|
|
|
|
|
│ CI / │ ─────────────────────────► │ Binary │ (Attic / S3 / nix-serve / Cachix)
|
|
|
|
|
│ Builder │ │ Cache │
|
|
|
|
|
└────┬────┘ └────▲─────┘
|
|
|
|
|
│ 2. publish signed manifest │
|
|
|
|
|
▼ │ 4. nix copy --from <cache>
|
|
|
|
|
┌─────────────┐ 3. GET current ┌────────┴─────┐
|
|
|
|
|
│ Control │ ◄──────────────── │ Device │
|
|
|
|
|
│ Server + UI │ 5. POST checkin │ Agent │ ──► switch-to-configuration
|
|
|
|
|
└─────────────┘ └──────────────┘
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The control server **never holds the signing key**. Operators (or CI)
|
|
|
|
|
sign manifests with an offline ed25519 key and POST them; devices
|
|
|
|
|
verify against a pinned public key. A server compromise cannot push
|
|
|
|
|
arbitrary closures.
|
|
|
|
|
|
|
|
|
|
## Components
|
|
|
|
|
|
|
|
|
|
| Crate | Binary | Role |
|
|
|
|
|
|--------------------|--------------------|------------------------------------|
|
|
|
|
|
| `crates/server` | `nix-ota-server` | REST API + SQLite + HTMX dashboard |
|
|
|
|
|
| `crates/agent` | `nix-ota-agent` | Polls, verifies, applies, rolls back|
|
|
|
|
|
| `crates/publisher` | `nix-ota` | Operator/CI CLI (keygen + publish) |
|
|
|
|
|
| `crates/common` | (lib) | Manifest types + ed25519 |
|
|
|
|
|
|
|
|
|
|
## Quickstart (< 10 minutes)
|
|
|
|
|
|
2026-05-25 15:57:32 +02:00
|
|
|
> 👉 For a complete copy-pasteable setup with two real NixOS flakes
|
|
|
|
|
> (server host + device host), see [`examples/`](./examples/).
|
|
|
|
|
|
Initial nix-ota implementation
Self-hostable OTA update system for NixOS fleets: a control server,
device agent, publisher CLI, and NixOS modules that ship prebuilt
system closures from a binary cache to devices that don't have the
flake.
- crates/common: signed manifest types (ed25519), store-path validator
- crates/server: axum + sqlite + HTMX dashboard, channel/device API
- crates/agent: poll, verify signature + revision, nix copy, switch,
health check, magic-rollback on failure
- crates/publisher: keygen + sign + publish CLI for operators/CI
- nix/modules: NixOS modules for server and agent
- nix/tests/ota.nix: end-to-end VM test exercising publish A -> B ->
broken C -> rollback to B (passes)
The control server never holds the signing key; manifests are signed
offline and verified against a pinned public key on each device.
2026-05-25 14:58:42 +02:00
|
|
|
### 1. Generate a signing key on your workstation
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- keygen --out ./sign.key
|
|
|
|
|
# prints the public key — save it, you'll bake it into every device.
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 2. Deploy the server
|
|
|
|
|
|
|
|
|
|
```nix
|
|
|
|
|
# configuration.nix
|
|
|
|
|
{
|
|
|
|
|
imports = [ nix-ota.nixosModules.server ];
|
|
|
|
|
services.nix-ota-server = {
|
|
|
|
|
enable = true;
|
|
|
|
|
openFirewall = true;
|
|
|
|
|
publishTokenFile = "/run/secrets/nix-ota-publish-token";
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 3. Install the agent on a device
|
|
|
|
|
|
|
|
|
|
```nix
|
|
|
|
|
{
|
|
|
|
|
imports = [ nix-ota.nixosModules.agent ];
|
|
|
|
|
services.nix-ota-agent = {
|
|
|
|
|
enable = true;
|
|
|
|
|
server = "https://ota.example.com";
|
|
|
|
|
channel = "prod";
|
|
|
|
|
deviceId = "fridge-007";
|
|
|
|
|
publicKey = "<base64 ed25519 pubkey from step 1>";
|
|
|
|
|
cacheUrl = "https://cache.example.com";
|
|
|
|
|
cachePublicKey = "cache.example.com:abc...=";
|
|
|
|
|
healthCmd = "systemctl is-system-running --wait"; # optional
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 4. Publish your first update
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
nix build .#nixosConfigurations.fridge-007.config.system.build.toplevel
|
|
|
|
|
nix copy --to s3://my-cache ./result
|
|
|
|
|
|
|
|
|
|
nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- publish \
|
|
|
|
|
--server https://ota.example.com \
|
|
|
|
|
--token $(cat publish-token) \
|
|
|
|
|
--key ./sign.key \
|
|
|
|
|
--channel prod \
|
|
|
|
|
--store-path $(readlink -f result) \
|
|
|
|
|
--substituter https://cache.example.com
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Open `https://ota.example.com/` to watch the fleet pick it up.
|
|
|
|
|
|
|
|
|
|
## How updates apply
|
|
|
|
|
|
|
|
|
|
On each poll the agent:
|
|
|
|
|
|
|
|
|
|
1. Fetches `/channels/<name>/current`.
|
|
|
|
|
2. Verifies the ed25519 signature against the pinned key.
|
|
|
|
|
3. Rejects manifests with a revision ≤ the last one applied (replay defense).
|
|
|
|
|
4. `nix copy --from <substituter> <storePath>` — Nix verifies cache
|
|
|
|
|
signatures on every store path.
|
|
|
|
|
5. `nix-env -p /nix/var/nix/profiles/system --set <storePath>`
|
|
|
|
|
6. `<storePath>/bin/switch-to-configuration switch`
|
|
|
|
|
7. Runs the optional `healthCmd`. On failure: switches back to the
|
|
|
|
|
previous generation and reports `rolled_back`.
|
|
|
|
|
|
|
|
|
|
## Threat model
|
|
|
|
|
|
|
|
|
|
| Threat | Mitigation |
|
|
|
|
|
|-----------------------------------------|---------------------------------------------------------------------------|
|
|
|
|
|
| Compromised control server pushes evil | Manifests must be signed by offline ed25519 key pinned on every device. |
|
|
|
|
|
| Compromised cache serves wrong closure | Nix verifies per-path signatures against `trusted-public-keys`. |
|
|
|
|
|
| Replay of an older (vulnerable) closure | Manifest carries monotonic `revision`; agent persists & rejects rollbacks.|
|
|
|
|
|
| Random internet caller publishes | `POST /channels/:name/publish` requires bearer token. |
|
|
|
|
|
| Random caller reads fleet state | UI/API should be put behind your reverse proxy / SSO. (v1: no built-in auth on reads.) |
|
|
|
|
|
| Bad closure bricks device | Health-check + magic-rollback to previous system generation. |
|
|
|
|
|
|
|
|
|
|
**Key management:** keep `sign.key` offline (hardware token, ops laptop,
|
|
|
|
|
or a sealed CI secret). The server never sees it. Rotating: generate a
|
|
|
|
|
new key, update `publicKey` on devices in a closure published with the
|
|
|
|
|
old key, then start signing with the new one.
|
|
|
|
|
|
|
|
|
|
## Non-goals (v1)
|
|
|
|
|
|
|
|
|
|
- The server does no Nix evaluation or building — CI does that.
|
|
|
|
|
- No replacement for your binary cache — use Attic, Cachix, S3, nix-serve.
|
|
|
|
|
- No per-device secrets (use sops-nix / agenix inside the closure).
|
|
|
|
|
- No web-based config editing — config lives in your flake repo.
|
|
|
|
|
|
|
|
|
|
## Development
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
nix develop
|
|
|
|
|
cargo build --workspace
|
|
|
|
|
cargo test --workspace
|
|
|
|
|
nix flake check # runs the full NixOS VM test
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## License
|
|
|
|
|
|
|
|
|
|
MIT OR Apache-2.0.
|