Architecture
Table of contents
- Overview
- Components
- Image Registry
- File Copy (
uni cp) - Networking
- Health Checks
- Restart Policies
- Web Dashboard
- Resource Quotas
- I/O Throttling
- Cluster Membership
- Security Model
Overview
Uni is structured as a client–daemon system, the same model used by Docker:
┌─────────────────────────────────────────────────────────┐
│ uni (CLI — short-lived process) │
│ │
│ build · run · ps · status · logs · stop · rm · inspect · exec · cp │
│ compose up · compose down · compose ps · compose logs │
│ volume create · volume ls · volume rm · volume inspect │
│ network create · network ls · network inspect · network rm │
│ dns resolve · dns list │
│ node ls │
│ sign · verify │
│ pkg list · pkg search · pkg get · pkg remove │
│ kernel check · kernel update · kernel list · kernel use│
│ upgrade · upgrade check · upgrade list │
└──────────────────────────┬──────────────────────────────┘
│
│ JSON-RPC 2.0 over Unix domain socket
│ /var/run/unid.sock
│
┌──────────────────────────▼──────────────────────────────┐
│ unid (daemon — long-running background process) │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ VM Manager │ │ Image Registry (HTTP) │ │
│ │ │ │ │ │
│ │ QEMUManager │ │ GET /v2/images │ │
│ │ ┌────────────┐ │ │ POST /v2/images │ │
│ │ │ VM #1 │ │ │ GET /v2/images/{ref} │ │
│ │ │ qemu-sys.. │ │ │ GET /v2/images/{ref}/disk │ │
│ │ └────────────┘ │ │ DELETE /v2/images/{ref} │ │
│ │ ┌────────────┐ │ └──────────────────────────────┘ │
│ │ │ VM #2 │ │ │
│ │ │ qemu-sys.. │ │ ┌──────────────────────────────┐ │
│ │ └────────────┘ │ │ Image Store │ │
│ └──────────────────┘ │ ~/.uni/images/ │ │
│ │ <sha256>/manifest.json │ │
│ │ <sha256>/disk.img │ │
│ │ refs.json │ │
│ └──────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────┘
│ spawns
┌──────────────────────────▼──────────────────────────────┐
│ QEMU processes (one per running VM) │
│ │
│ qemu-system-x86_64 │
│ -m 256M │
│ -drive file=disk.img,format=raw,if=virtio │
│ -nographic -serial stdio -no-reboot │
└──────────────────────────┬──────────────────────────────┘
│ boots
┌──────────────────────────▼──────────────────────────────┐
│ Nanos Kernel (C + ASM fork) │
│ Loads and runs the static ELF application │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ unireg (standalone registry server) │
│ │
│ Same OCI/legacy HTTP API as embedded registry. │
│ Independently deployable for multi-node or CI use. │
│ Flags: --addr, --token, --jwt-secret, --tls-cert, │
│ --tls-key, --no-auto-tls │
└─────────────────────────────────────────────────────────┘
Components
uni CLI (cmd/uni/)
The command-line interface. It is a thin client — it does no VM management itself. Every command translates directly into a JSON-RPC call to unid.
- One
.gofile per subcommand (run.go,ps.go,stop.go, …) - Zero business logic — just argument parsing and formatting
- Cobra framework for command routing
unid daemon (cmd/unid/)
The long-running background process that owns everything:
- Listens on a Unix domain socket (JSON-RPC 2.0)
- Manages the VM registry (in-memory
Store) - Spawns and monitors QEMU processes
- Optionally serves the HTTP image registry
VM Manager (internal/vm/)
Manages the lifecycle of individual VMs:
State machine:
created → starting → running → stopping → stopped
Every transition is atomic (protected by sync.RWMutex) and logged with slog.
Key types:
VM— represents one virtual machine (ID, config, state, timestamps, log buffer, health status, restart count)QEMUManager— implements theManagerinterface by spawningqemu-system-x86_64Store— thread-safe registry interface for all known VMs;MemoryStorefor in-memory,FileStorefor JSON persistenceHealthChecker— manages TCP/HTTP probe goroutines per VMRestartConfig/RestartPolicy— controls automatic restart behaviourRuntimeStats/StatsCollector— runtime resource usage (CPU%, memory, network I/O) per VM;ProcStatsCollectoron Linux reads/proc/[pid]/stat,/proc/[pid]/statm,/proc/[pid]/net/dev;NoopStatsCollectorfallback on other platforms
QEMU command built per VM:
qemu-system-x86_64 \
-m 256M \
-drive file=/path/to/disk.img,format=raw,if=virtio \
-nographic \
-serial stdio \
-no-reboot \
-net none
Serial console output (stdout + stderr from QEMU) is captured into a thread-safe buffer, accessible via uni logs. When a VM is started with --attach, the output is simultaneously streamed through an io.Pipe so the CLI can read it in real-time via the VM.Attach RPC method.
Kernel Tools Cache (internal/tools/)
The kernel artifacts (kernel.img, boot.img, mkfs, dump) are downloaded from GitHub releases and cached in ~/.uni/tools/. They are versioned independently from the CLI using semver (kernel/VERSION in the repo).
Download flow:
uni buildcallstools.ResolveMkfs()uni cpcallstools.ResolveDump()- If tools are absent →
DownloadVersion("latest")fetches all artifacts + saveskernel-version.txt - If tools are present → checks remote version via GitHub API; if newer, prompts
[y/N]before replacing
Versioned releases: each kernel release is tagged kernel-vX.Y.Z on GitHub and is immutable. A rolling latest release always points to the most recent build. uni kernel use <v> downloads from the specific versioned tag.
Image System (internal/image/)
Content-addressable store — images are stored by their SHA256 digest:
~/.uni/images/
refs.json ← maps "name:tag" → "sha256hex"
abc123def456.../
manifest.json ← image metadata
disk.img ← raw VM disk
Manifest format (manifest.json):
{
"schemaVersion": 1,
"name": "hello",
"tag": "latest",
"created": "2026-04-19T10:00:00Z",
"config": {
"memory": "256M",
"cpus": 1
},
"diskDigest": "sha256:abc123...",
"diskSize": 12582912
}
Builder pipeline (image.Builder):
- Validate ELF magic bytes on the binary
- Run
mkfs(Nanos tool) to create a raw disk image containing the binary - Compute SHA256 of the disk
- Write manifest + disk to the store
API (internal/api/)
JSON-RPC 2.0 over a Unix domain socket.
Methods:
| Method | Description |
|---|---|
VM.Run | Create + start a VM |
VM.Stop | Graceful or forced stop |
VM.Kill | Immediate SIGKILL |
VM.Signal | Send arbitrary signal |
VM.Remove | Delete a stopped VM |
VM.List | List all VMs |
VM.Get | Get one VM by ID |
VM.Logs | Get captured serial output (snapshot) |
VM.Attach | Stream serial console output in real-time |
VM.Inspect | Full VM details |
VM.Stats | Runtime resource usage (CPU, memory, network) |
Network.Create/List/Get/Remove | Manage named networks |
Network.AllocateIP/ReleaseIP | IPAM allocation lifecycle |
DNS.Resolve | Resolve service/VM names to IP |
DNS.List | List active DNS records |
Node.List | List cluster members (requires --cluster-addr) |
Compose (internal/compose/)
Parses compose YAML files and resolves startup order:
- Parser — validates schema (version, service images, dependency refs, network refs)
- Graph — Kahn’s topological sort algorithm with cycle detection
Package System (internal/package/)
Manages pre-packaged files that can be included in images at build time:
- Store — local cache at
~/.uni/packages/<name>/<version>/holding:files.tar.gz— the downloaded package archivefiles/— extracted contents of the archivemeta.json— package metadata (name, version, SHA256, etc.)
- FetchIndex — retrieves the remote package index listing available packages and versions
- Download — fetches the package archive from its URL and stores it locally (with size verification)
- Extract — decompresses
files.tar.gzinto thefiles/subdirectory - ExtractedFiles — lists all regular files inside the extracted package
- Search — queries the remote index by name, description, or runtime
- Get — downloads a package (optionally a specific version) to the local store
- Remove — deletes a specific version; RemoveAll — deletes all versions of a package
- Create — builds a local package archive from a binary and optional additional files, computing SHA256 and writing
meta.json
Packages are included at build time via uni build --pkg <name>[:<version>]. The build pipeline:
resolvePackages()fetches the remote index and resolves each--pkgreference- Downloads the archive (
files.tar.gz) if not already cached - Extracts the archive into
files/if not already extracted - Collects all individual file paths from
files/viaExtractedFiles() - Passes the file list to
buildManifest()which includes each file in the Nanos manifest
Environment Variable Injection
Environment variables passed via uni run -e KEY=VALUE reach the guest through QEMU’s fw_cfg device — no disk rebuild required.
Flow:
uni run -e KEY=VAL→ daemon builds-fw_cfg name=opt/uni/env,string=KEY=VAL\n- QEMU exposes this as a named file on the fw_cfg device (I/O ports
0x510/0x511) - At boot,
env_inject_from_fw_cfg()in the kernel readsopt/uni/envand merges entries into the process environment tuple beforeexec_elfbuilds the user-space stack
This is x86-64 only; the function compiles to a no-op stub on aarch64.
Network Configuration Injection
Static IP configuration passed via uni run --ip reaches the guest through QEMU’s fw_cfg device, the same mechanism used for environment variables.
Flow:
uni run --ip 10.0.0.2 --network tap0→ daemon builds-fw_cfg name=opt/uni/network,string=10.0.0.2/24,10.0.0.1- QEMU exposes this as a named file on the fw_cfg device (I/O ports
0x510/0x511) - At boot,
net_inject_from_fw_cfg()in the kernel readsopt/uni/network, parses the IP/CIDR and gateway, and injects them into the root tuple init_network_iface()picks up the injected values to configure the first ethernet interface with a static IP instead of DHCP
The format is IP/CIDR,GATEWAY (e.g. 10.0.0.2/24,10.0.0.1). This is x86-64 only.
Image Registry
When started with --registry-addr :5000, unid serves an HTTP registry.
Current behavior is hybrid:
- Legacy API under
/v2/imagesis still available for backward compatibility. - OCI v2 foundations are available under
/v2/...for blob upload/download and manifest put/get/delete. - OCI blobs are persisted in
~/.uni/blobs. - OCI manifest refs/bodies are persisted in
~/.uni/ociand survive daemon restarts.
GET /v2/images list all images (legacy)
GET /v2/images/{ref} get manifest (legacy)
GET /v2/images/{ref}/disk download raw disk image (legacy)
POST /v2/images push image multipart (legacy)
DELETE /v2/images/{ref} remove image (legacy)
GET /v2/ OCI API base (200 when available)
GET /v2/_catalog list OCI repositories
POST /v2/{name}/blobs/uploads/ start OCI blob upload
PATCH /v2/{name}/blobs/uploads/{uuid} append OCI blob upload chunk
PUT /v2/{name}/blobs/uploads/{uuid} complete OCI blob upload
GET /v2/{name}/blobs/{digest} download OCI blob
HEAD /v2/{name}/blobs/{digest} check OCI blob existence + digest
DELETE /v2/{name}/blobs/{digest} delete OCI blob
PUT /v2/{name}/manifests/{ref} store OCI manifest ref
GET /v2/{name}/manifests/{ref} read OCI manifest ref
HEAD /v2/{name}/manifests/{ref} check OCI manifest existence + digest
DELETE /v2/{name}/manifests/{ref} delete OCI manifest ref
{name} supports nested repository names (e.g. team/api, org/project/service) for OCI blob and manifest routes.
Full OCI compliance/auth/signing is tracked in Phase 8, but the migration path is active: uni push/pull use OCI first and fall back to legacy endpoints if needed.
Registry auth is now available as an optional static bearer token gate:
- Start daemon with
--registry-token <token>(orUNI_REGISTRY_TOKEN=<token>) - When enabled, registry endpoints require
Authorization: Bearer <token> - Unauthorized requests return
401with Docker-style challenge headers includingrealm,service, and scopedrepository:<name>:pull|pushwhen applicable
Scoped JWT auth is also available for registry endpoints:
- Start daemon with
--registry-jwt-secret <secret>(orUNI_REGISTRY_JWT_SECRET=<secret>) - Optional claim checks can be configured with
--registry-jwt-issuer/UNI_REGISTRY_JWT_ISSUERand--registry-jwt-audience/UNI_REGISTRY_JWT_AUDIENCE - Tokens are validated as HMAC JWTs and must include a
scopeclaim - Supported scope format is Docker-style:
repository:<name>:pull,push(supports*repo wildcard) - Missing/invalid tokens return
401; valid tokens without required action scope return403
Registry HTTPS can be enabled with custom certificate files:
- Start daemon with
--registry-tls-cert <path>and--registry-tls-key <path> - Environment alternatives:
UNI_REGISTRY_TLS_CERTandUNI_REGISTRY_TLS_KEY - Both cert and key are required together; partial TLS config is rejected at startup
Registry blob garbage collection is available via daemon command:
- Run
unid gcto remove OCI blobs in~/.uni/blobsthat are not referenced by any manifest in~/.uni/oci - Referenced blobs (manifest config + layer digests) are preserved
File Copy (uni cp)
uni cp copies files to and from stopped VM disk images using the dump and mkfs tools from the Nanos kernel toolchain. The tools read and write the TFS (Tiny File System) filesystem directly on the raw disk image.
Copy FROM a VM — the dump tool extracts the entire filesystem to a temporary directory, then the requested file is copied to the destination.
Copy TO a VM — the dump tool extracts the filesystem, the new file is injected, then mkfs rebuilds the disk image with the updated content.
Download flow:
uni cpcallstools.ResolveDump()(andtools.ResolveMkfs()for copy-to-VM)- If tools are absent from
~/.uni/tools/→downloadArtifact()fetches them from the latest kernel release - For copy-from: extract filesystem, copy file to host
- For copy-to: extract filesystem, copy file in, rebuild disk with
mkfs
This requires the VM to be in stopped state because the disk image must not be in use by a running QEMU process.
Networking
Each VM can use one of two networking modes:
SLIRP user-mode (default for -p): QEMU’s built-in user-mode networking with port forwarding via hostfwd rules. Works on any platform without root access. Does not support inbound ICMP (ping).
TAP + bridge: A TAP interface is created and bridged on the Linux host, giving the VM full network access including its own IP address. Requires Linux and elevated permissions. When port mappings (-p) are used together with --network, iptables DNAT rules are automatically configured so that traffic arriving at the host is forwarded to the guest’s static IP. The bridge is created via internal/network/bridge_linux.go, the TAP is attached, and iptables rules (with interface filtering via -i tapName) are applied for port forwarding. When --ip is specified, the guest-side static IP is configured via fw_cfg (opt/uni/network) — no DHCP required.
TAP networking requires Linux and elevated permissions. It is not available on Windows. See internal/network/tap.go (Linux-only build tag).
Health Checks
VMs can be configured with liveness probes that run periodically after startup:
- TCP probe — succeeds if a TCP connection can be established to the guest port
- HTTP probe — succeeds if an HTTP GET to the guest port/path returns a 2xx status code
Configuration (via --health-check flag or API):
| Parameter | Default | Description |
|---|---|---|
| Type | — | tcp or http |
| Port | — | Guest port to probe (maps to host port via PortMaps if set) |
| Path | / | HTTP path (only for http type) |
| Interval | 10s | Time between probes |
| Timeout | 3s | Per-probe timeout |
| Retries | 3 | Consecutive failures before marking unhealthy |
Probe target resolution: when PortMaps are configured, the probe targets the host-side port. Otherwise it targets the guest port directly on 127.0.0.1.
Health States:
| State | Meaning |
|---|---|
starting | Probe period not yet elapsed |
healthy | Last probe succeeded |
unhealthy | Consecutive failures exceeded Retries |
unknown | No health check configured |
Restart Policies
When a VM exits (crashes or terminates), the daemon can automatically restart it:
| Policy | Behavior |
|---|---|
never | Never restart (default) |
on-failure | Restart only on non-zero exit code |
always | Always restart, even on clean exit (unless explicitly stopped) |
Configuration (via --restart flag or API):
--restart never # never restart (default)
--restart on-failure # restart on crash (unlimited retries)
--restart on-failure:5 # restart on crash, max 5 retries
--restart always # always restart (unlimited retries)
--restart always:3 # always restart, max 3 retries
Exponential backoff between restarts: 1s, 2s, 4s, 8s, 16s, capped at 30s.
Important: StateStopped is terminal — the restart creates a new VM with the same Config. The old VM is removed from the store and the new VM gets a fresh ID and incremented RestartCount.
Explicit stop operations (uni stop or uni kill) set an explicitStop flag that prevents restart regardless of policy.
Web Dashboard
The daemon serves a read-only web dashboard when --ui-addr is configured (e.g. --ui-addr :8080).
Pages
| Route | Description |
|---|---|
/ui | VM list with state, health, image |
/ui/vm/{id} | VM detail: config, health, restart info, port mappings, env vars, serial console log tail |
JSON API Endpoints
| Endpoint | Description |
|---|---|
/ui/api/vms | List all VMs (id, name, state, image, health) |
/ui/api/vm/{id} | Full VM detail as JSON |
/ui/api/vm/{id}/logs | Serial console output for a VM |
/ui/api/vm/{id}/stats | Live runtime stats (CPU%, memory, network I/O) |
The dashboard uses Go HTML templates with a dark theme. No JavaScript framework is required. VM IDs in the list are clickable links to the detail page. The detail page polls stats every 3 seconds via the /ui/api/vm/{id}/stats endpoint.
Resource Quotas
VMs can have CPU and memory limits enforced via Linux cgroup v2 when available.
CPU Shares
The --cpu-shares flag sets the cgroup v2 CPU weight for the QEMU process:
uni run myapp:latest --cpu-shares 512
CPU weight ranges from 1 to 10000 (default 100). This controls relative CPU allocation among competing VMs, not an absolute limit.
Memory Hard Limit
The --memory-max flag sets a cgroup v2 memory hard limit:
uni run myapp:latest --memory-max 512M
When the QEMU process exceeds this limit, the kernel OOM killer will terminate it. Supported suffixes: K, M, G.
Platform Requirements
Both features require Linux with cgroup v2 (/sys/fs/cgroup/cgroup.controllers must exist). On non-Linux platforms, the flags are accepted but no limits are enforced and a warning is logged.
The daemon creates a cgroup at /sys/fs/cgroup/uni/<vm-id>/ for each VM with resource limits, moves the QEMU PID into it on start, and removes the cgroup on VM exit.
I/O Throttling
Disk I/O for the boot disk can be limited using QEMU’s native drive throttle:
# Limit to 1000 IOPS
uni run myapp:latest --disk-iops 1000
# Limit to 10MB/s throughput
uni run myapp:latest --disk-bps 10M
# Both limits
uni run myapp:latest --disk-iops 500 --disk-bps 5M
| Flag | Unit | Description |
|---|---|---|
--disk-iops | IOPS | Maximum I/O operations per second (0 = no limit) |
--disk-bps | bytes/sec | Maximum throughput (e.g. 10M, 1G; 0 = no limit) |
These limits apply to the boot disk only. Volume disks are not throttled.
Cluster Membership
When started with --cluster-addr, unid joins a SWIM-style gossip cluster for node discovery and health monitoring.
How it works:
Each daemon runs a lightweight gossip protocol over HTTP:
- Join — on startup, contacts seed nodes listed in
--joinand exchanges membership tables - Gossip — every 5 seconds, picks a random peer and exchanges membership state via
POST /cluster/gossip - Suspicion — if a member is not heard from for 15 seconds, it is marked
suspect - Dead — if a suspect is not heard from for 30 seconds, it is marked
dead - Leave — on graceful shutdown, the local node broadcasts its
leftstatus
Member states:
| State | Meaning |
|---|---|
alive | Active and responding to gossip |
suspect | Not heard from recently (may be network issue) |
dead | Not heard from for an extended period |
left | Gracefully shut down |
Dead and Left statuses are always propagated regardless of timestamp, ensuring cluster-wide consistency.
Usage:
# Start first node
unid --cluster-addr :7946
# Start second node, joining the first
unid --cluster-addr :7946 --join 10.0.0.1:7946
# Start with multiple seeds
unid --cluster-addr :7946 --join 10.0.0.1:7946,10.0.0.2:7946
# List cluster members
uni node ls
0.0.0.0 bind addresses are normalized to 127.0.0.1 for inter-node communication.
Security Model
unidruns as root (or a privileged user) to spawn QEMU and manage TAP interfaces- The Unix socket is the trust boundary — only processes that can access the socket file can manage VMs
- Each VM runs in full KVM hardware isolation — a compromised unikernel cannot escape to the host or other VMs
- No shell, no SSH, no dynamic linking inside the unikernel — attack surface is minimal by design