nix-ota/examples/README.md

137 lines
4.3 KiB
Markdown
Raw Normal View History

# Worked example: standing up nix-ota
This walks through deploying a real fleet with two boxes:
- **`ota.example.com`** — runs `nix-ota-server` (control plane) and
`nix-serve` (binary cache). One machine, public DNS.
- **`fridge-007`** — a NixOS device that pulls updates. No flake, no
Nix evaluation. Could be an RPi, a kiosk, an edge node, anything.
You drive everything from your laptop. The laptop holds the signing
key and runs `nix-ota publish` to ship updates.
```
laptop ──signed manifest──► ota.example.com ◄──poll── fridge-007
│ (server + cache) │
└────nix copy closure───────────────┘ │
└────nix copy closure─┘
```
---
## 0. One-time: generate the manifest signing key
On your laptop:
```sh
nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- keygen --out ~/.config/nix-ota/sign.key
# prints the public key — save it, you'll bake it into every device.
```
Keep `sign.key` somewhere you trust (password manager, hardware token,
or a sealed CI secret). The server **never sees this key**.
---
## 1. The server host
See [`server-host/`](./server-host/) for a complete flake.
What you need on the server:
- `services.nix-ota-server` — the control plane (HTTP API + dashboard)
- `services.nix-serve` — the binary cache (or use Attic / S3 / Cachix)
- A reverse proxy with TLS in front of both (nginx, Caddy, traefik...)
- A bearer token for publishes, stored in `/run/secrets/...` (sops-nix
or agenix; the example uses a plain file for clarity)
Deploy it however you normally deploy a NixOS box (nixos-rebuild,
deploy-rs, colmena — yes, you can use any of those *to deploy the
nix-ota server itself*; we're only replacing them for the fleet).
After it's up:
```sh
curl https://ota.example.com/healthz # -> ok
curl https://ota.example.com/ # dashboard
```
---
## 2. The device
See [`device-host/`](./device-host/) for a complete flake.
What goes on each device:
- `services.nix-ota-agent` — the polling agent (a single static binary
driven by a systemd timer)
- The matching ed25519 **public key** (so the device rejects manifests
not signed by your key)
- The binary cache's URL **and public key** (so Nix accepts store paths
fetched from it)
You deploy this flake to the device **once**, manually. From then on
you never touch the device's config: subsequent updates ride on top
through `nix-ota publish`.
---
## 3. Publishing your first update
From your laptop:
```sh
# 1. Build the device's system closure.
nix build .#nixosConfigurations.fridge-007.config.system.build.toplevel
# 2. Push it to the cache.
nix copy --to 'https://ota.example.com/cache?secret-key=/path/to/cache.key' \
./result
# 3. Publish a signed manifest pointing at it.
nix run git+https://linus.dyrehytten.dk/max/nix-ota#nix-ota -- publish \
--server https://ota.example.com \
--token "$(cat ~/.config/nix-ota/publish-token)" \
--key ~/.config/nix-ota/sign.key \
--channel prod \
--store-path "$(readlink -f ./result)" \
--substituter https://ota.example.com/cache
```
Within `interval` seconds (default 60), `fridge-007` polls, verifies
the signature, copies the closure from your cache, switches into it,
runs your health check, and check-ins. Open
`https://ota.example.com/` to watch it happen.
Rolling back is just publishing the previous store path again — bump
the revision and ship it.
---
## What you do NOT need
- ❌ SSH access from the server to the devices.
- ❌ The flake on the device. Once the agent + initial config are
installed, you can drop your nix-ota flake reference from the device
entirely (subsequent updates carry it).
- ❌ Per-device builds. Build once, publish, every device on that channel
picks it up.
- ❌ A Nix daemon talking to the control server. Devices talk to the
*cache*; the control server only hands out signed pointers.
---
## File map
```
examples/
├── README.md (this file)
├── server-host/
│ ├── flake.nix full flake for the control-plane host
│ └── configuration.nix
└── device-host/
├── flake.nix full flake for a device
└── configuration.nix
```