What it does today
machines-server is a Flask service on port 3038 that
ships with Consciousness Server. It is not just a YAML
reader — it actively probes the host, queries connected
services, and exposes one aggregate view of the entire ecosystem
for any AI agent that asks.
- Real-time GPU telemetry — calls
nvidia-smion the host and parses utilization percent, VRAM used / total in MB, and GPU temperature. Returnsnullcleanly on hosts without an NVIDIA card, so the same code path works on any machine. - Real-time system stats via
psutil— CPU percent (across all cores), core count, load averages, RAM total / used / available in GB, disk usage. Cross-platform (Linux / macOS / Windows). - Dynamic service health monitoring — reads
services.yamlat request time, opens HTTP connections to every declared port, classifies each asactiveorinactivebased on response code (anything < 500 counts as alive — handles WebSocket upgrades and redirects correctly). - Static machine registry — every machine declared
in
machines/<name>.yamlwith hardware spec, network address, role, list of agents that live there, and which Ollama models it has loaded. - Aggregate
/api/infrastructureendpoint — a single call returns local system stats plus machine configs plus runtime machines from CS plus services with live status plus registered agents. One response to answer "what is the state of this entire ecosystem right now?". - Native MCP protocol support — exposes
/mcp/toolsand/mcp/call. Claude Code or any other MCP-aware client can use machines-server directly as a tool provider, callingget_system_resources,get_infrastructure, orlist_machineswithout writing custom HTTP code. - ed25519 auth via AUTH_MODE — the same three-state
signed-request middleware as Consciousness Server (off / observe
/ enforce). Telemetry calls in
enforcemode require a signed request; agents talking to it have key-server-issued keys.
API surface
Six endpoints. The whole machines-server contract:
| Method | Path | Purpose |
|---|---|---|
| GET | /health | Health check + uptime |
| GET | /api/system | Live CPU / RAM / disk / GPU stats for this host |
| GET | /api/machines | Static machine definitions from machines/*.yaml |
| GET | /api/services | All services from services.yaml with current HTTP status |
| GET | /api/infrastructure | Aggregate: local system + machines (config + runtime) + services + agents + summary counts |
| GET | /mcp/tools | MCP tool definitions for Claude Code or other MCP clients |
| POST | /mcp/call | Execute an MCP tool by name with optional args |
That is the full API. Adding a new capability means a new tool in
/mcp/tools + a handler in /mcp/call;
there is no hidden internal API.
Why this matters
Every multi-agent platform talks about agents. Almost none talk about machines. Yet a local agent on a GPU workstation has fundamentally different capabilities than a worker on a Raspberry Pi: different memory, different latency, different model weights loaded, different power budget. Modelling that explicitly is the difference between "I have agents" and "I have a fleet".
Once a machine is a first-class entity with a known hardware profile, a lot of things become easy:
- Heavy reasoning routes to the GPU box; classification routes to the cheap CPU host.
- Agents on different machines see each other through Consciousness Server and coordinate by name.
- A machine going offline triggers automatic re-routing of pending tasks.
- You can rent compute by adding a node — the rest of the fleet adopts it via its YAML descriptor.
A real example
The author's production setup, modelled in YAML:
- Workstation with GPU — Cortex with
gemma4:26b, takes heavy reasoning and code-review tasks. - Application host (CPU only) — Cortex with
gemma4:e4b, handles classification, summarisation, fast small-context work. - Laptop — interactive agents, orchestration, the developer's primary workspace.
All three see the same shared memory in Consciousness Server.
machines-server tells any agent which of them has
headroom right now. Tasks flow to whichever node makes sense.
Where this goes — exploratory directions
The paradigm generalises: anything with a controller, a state, and telemetry is a machine. None of the scenarios below ship today; they describe what the architecture enables once the right adapters exist. We mark them Planned or Exploratory honestly — flagged here so contributors and customers can shape the roadmap together.
Prototype shop / small factory
3D printers, plotters, CNC mills, laser cutters. An agent picks up a print queue, dispatches files to printers with the right material loaded, monitors job state, reports failures. Mixed deterministic execution (g-code goes to the printer verbatim) with non-deterministic decisions (which file first, which printer, what to do on a jam). Planned.
Biology / clinical lab
Incubators, cytometers, sequencers. An agent reads experiment metadata, schedules samples, cross-references results against shared memory in Consciousness Server ("how did the same condition behave in last week's run?"). Lab notebooks become queryable corpus. Planned.
Energy infrastructure
PV panels, battery storage, EV chargers, server-room load. An agent balances when to charge the car, when to power servers, when to sell surplus to the grid. All locally, with no third-party cloud knowing your home-energy schedule. Exploratory.
Building / smart-home enterprise
HVAC, climate sensors, blinds, lighting. Agent learns occupancy patterns, suggests optimisations, never phones home to a vendor cloud. Exploratory.
Drones — non-kinetic uses
Light-show choreography, infrastructure inspection (your own pipelines, towers, roofs), precision-agriculture monitoring. Each drone is a machine with telemetry; agents plan routes, react to wind, optimise battery rotation. Kinetic / weapons applications are out of scope by licence — see ethics. Exploratory.
Private fleet / heavy equipment
Trucks, combines, excavators on a single owner's site. Telemetry, diagnostics, predictive maintenance — keyed to the owner's infrastructure rather than a manufacturer's cloud. Exploratory.
Important framing: all of the above are what the architecture enables, not what ships today. We never claim "we already control drones" or "we run a 3D-printing farm". We claim "an agent can know that machine A has GPU capacity and machine B is offline" — that's Ships. The rest is the design space opened by treating machines as first-class.
Red lines — what we won't enable
Written into our commercial licence, not just marketing:
- Weapon systems — kinetic, autonomous targeting, AI-guided ordnance. No.
- Mass surveillance — city-scale watching, face recognition at scale, social scoring, behavioural prediction for political repression. No.
- Autonomous critical infrastructure without humans in the loop — power grids, traffic control, clinical decisions taken without medical oversight. No.
If your use case conflicts with these lines, BuildOnAI is not the right tool. The licence is enforced as a contractual matter, not as a marketing slogan.
Next steps
- Consciousness Server (where machines-server lives) →
- Cortex (the agent that runs on each machine) →
- View Consciousness Server on GitHub