Observability

Table of contents

  1. Prometheus Metrics
    1. Available Metrics
    2. Prometheus Scrape Config
  2. OpenTelemetry Tracing
    1. VM Lifecycle Spans
    2. Collector Setup
  3. Structured JSON Logging
    1. Output Format
  4. Live VM Stats
    1. Available Metrics
    2. JSON Output
  5. Web Dashboard
    1. Pages
    2. JSON API
  6. VM Persistence (SQLite)
    1. Store Backends
    2. Health Status Persistence
  7. Nightly Security Scans
  8. Resource Quotas
  9. I/O Throttling

Uni provides built-in observability for production unikernel workloads: Prometheus metrics, OpenTelemetry distributed tracing, structured JSON logging, live VM stats, and a web dashboard.

Prometheus Metrics

Enable the metrics endpoint with --metrics-addr on the daemon:

unid --metrics-addr :9090

This starts an HTTP server with:

Endpoint Description
/metrics Prometheus-formatted metrics
/health Health check (returns 200 OK)

Available Metrics

Metric Type Description
uni_vms_total gauge Total VMs registered
uni_vms_running gauge VMs currently in running state
uni_vms_stopped gauge VMs currently in stopped state
uni_vm_lifecycle_total counter VM lifecycle transitions (create, start, stop, kill, remove)
uni_registry_push_total counter Image push operations
uni_registry_pull_total counter Image pull operations
uni_build_info gauge Build info (version label)

Prometheus Scrape Config

scrape_configs:
  - job_name: 'unid'
    static_configs:
      - targets: ['localhost:9090']

The daemon also updates VM state gauges every 5 seconds via an internal poller.


OpenTelemetry Tracing

Enable distributed tracing with --trace-addr:

unid --trace-addr localhost:4317

This configures an OTLP gRPC exporter. When empty (default), tracing is completely disabled (no-op provider).

VM Lifecycle Spans

The daemon creates spans for these VM lifecycle events:

Span Name Description
vm.lifecycle Parent span for a VM operation
vm.create VM registration
vm.start QEMU process launch
vm.stop Graceful shutdown
vm.kill Immediate kill
vm.remove VM removal from store

Collector Setup

Point --trace-addr at your OTLP collector (Jaeger, Tempo, etc.):

# Using Jaeger with OTLP collector
unid --trace-addr localhost:4317

Structured JSON Logging

Switch from the default text format to JSON with --log-format:

unid --log-format json

Output Format

JSON log lines include these fields:

{
  "ts": "2026-05-15T12:00:00.000Z",
  "level": "INFO",
  "msg": "vm state transition",
  "vm_id": "abc123",
  "from": "created",
  "to": "starting"
}

JSON logs ship easily to Loki, Splunk, Datadog, or any log aggregation system.


Live VM Stats

The uni stats command shows real-time resource usage per VM:

# One-time snapshot
uni stats <vm-id>

# Continuous watch (3s interval by default)
uni stats <vm-id> --watch

# Custom interval
uni stats <vm-id> --watch --interval 5s

Available Metrics

Metric Description
CPU % Percentage of CPU used by the QEMU process
Memory Resident memory in bytes
Net RX Total network bytes received
Net TX Total network bytes transmitted
Source proc (Linux /proc) or fallback (non-Linux)

JSON Output

uni stats <vm-id> --output json

Web Dashboard

Enable the read-only web dashboard with --ui-addr:

unid --ui-addr :8080

Pages

Route Description
/ui VM list with state, health, and image
/ui/vm/{id} VM detail page: config, health, restart info, ports, env vars, serial console log tail, live stats

JSON API

Endpoint Description
/ui/api/vms List all VMs
/ui/api/vm/{id} Full VM detail
/ui/api/vm/{id}/logs Serial console output
/ui/api/vm/{id}/stats Live runtime stats (CPU%, memory, network I/O)

The VM detail page polls stats every 3 seconds and renders CPU%, memory, and network I/O inline. No JavaScript framework is required.


VM Persistence (SQLite)

By default, VM state is persisted as per-VM JSON files. For improved reliability, switch to SQLite:

unid --vm-store sqlite

The SQLite store automatically migrates any existing state.json VMs on first use. Migration is idempotent — re-running does not create duplicates.

Store Backends

Backend Flag Storage Best for
file --vm-store file (default) ~/.uni/vms/<id>/state.json Simple setups
sqlite --vm-store sqlite ~/.uni/vms/vms.db Production reliability

Health Status Persistence

Both backends persist VM health status, restart counts, and timestamps across daemon restarts. VMs that were running when the daemon exited are automatically marked as stopped with daemon_recovered=true.


Nightly Security Scans

The nightly CI pipeline runs automated security checks:

Tool Check Failure condition
govulncheck Known Go vulnerabilities Any vulnerability in standard library or dependencies
trivy Filesystem CVE scan HIGH or CRITICAL severity findings

These run in .github/workflows/nightly.yml at 02:00 UTC daily.


Resource Quotas

When running on Linux with cgroup v2, you can enforce CPU and memory limits per VM:

# Limit CPU shares (1-10000, cgroup v2 weight)
uni run myapp:latest --cpu-shares 512

# Set memory hard limit
uni run myapp:latest --memory-max 1G

If cgroup v2 is not available, the flags are accepted but no limits are enforced (a warning is logged).

I/O Throttling

Disk I/O for the boot disk can be limited using QEMU’s native throttle:

# Limit to 1000 IOPS
uni run myapp:latest --disk-iops 1000

# Limit throughput to 10MB/s
uni run myapp:latest --disk-bps 10M

Released under the MIT License.