machines-server port 3038 part of consciousness-server v0.1.5

Machines aren't just servers.

Most agent platforms model agents but ignore the hardware they run on. BuildOnAI does the opposite: every machine in your fleet is a first-class entity, with hardware profile, available models, role, and live telemetry — so an agent can ask "which box has free VRAM and a tool-calling model loaded?" and get a real answer.

What it does today

machines-server is a Flask service on port 3038 that ships with Consciousness Server. It is not just a YAML reader — it actively probes the host, queries connected services, and exposes one aggregate view of the entire ecosystem for any AI agent that asks.

  • Real-time GPU telemetry — calls nvidia-smi on the host and parses utilization percent, VRAM used / total in MB, and GPU temperature. Returns null cleanly on hosts without an NVIDIA card, so the same code path works on any machine.
  • Real-time system stats via psutil — CPU percent (across all cores), core count, load averages, RAM total / used / available in GB, disk usage. Cross-platform (Linux / macOS / Windows).
  • Dynamic service health monitoring — reads services.yaml at request time, opens HTTP connections to every declared port, classifies each as active or inactive based on response code (anything < 500 counts as alive — handles WebSocket upgrades and redirects correctly).
  • Static machine registry — every machine declared in machines/<name>.yaml with hardware spec, network address, role, list of agents that live there, and which Ollama models it has loaded.
  • Aggregate /api/infrastructure endpoint — a single call returns local system stats plus machine configs plus runtime machines from CS plus services with live status plus registered agents. One response to answer "what is the state of this entire ecosystem right now?".
  • Native MCP protocol support — exposes /mcp/tools and /mcp/call. Claude Code or any other MCP-aware client can use machines-server directly as a tool provider, calling get_system_resources, get_infrastructure, or list_machines without writing custom HTTP code.
  • ed25519 auth via AUTH_MODE — the same three-state signed-request middleware as Consciousness Server (off / observe / enforce). Telemetry calls in enforce mode require a signed request; agents talking to it have key-server-issued keys.

API surface

Six endpoints. The whole machines-server contract:

Method Path Purpose
GET /health Health check + uptime
GET /api/system Live CPU / RAM / disk / GPU stats for this host
GET /api/machines Static machine definitions from machines/*.yaml
GET /api/services All services from services.yaml with current HTTP status
GET /api/infrastructure Aggregate: local system + machines (config + runtime) + services + agents + summary counts
GET /mcp/tools MCP tool definitions for Claude Code or other MCP clients
POST /mcp/call Execute an MCP tool by name with optional args

That is the full API. Adding a new capability means a new tool in /mcp/tools + a handler in /mcp/call; there is no hidden internal API.

Why this matters

Every multi-agent platform talks about agents. Almost none talk about machines. Yet a local agent on a GPU workstation has fundamentally different capabilities than a worker on a Raspberry Pi: different memory, different latency, different model weights loaded, different power budget. Modelling that explicitly is the difference between "I have agents" and "I have a fleet".

Once a machine is a first-class entity with a known hardware profile, a lot of things become easy:

  • Heavy reasoning routes to the GPU box; classification routes to the cheap CPU host.
  • Agents on different machines see each other through Consciousness Server and coordinate by name.
  • A machine going offline triggers automatic re-routing of pending tasks.
  • You can rent compute by adding a node — the rest of the fleet adopts it via its YAML descriptor.

A real example

The author's production setup, modelled in YAML:

  • Workstation with GPU — Cortex with gemma4:26b, takes heavy reasoning and code-review tasks.
  • Application host (CPU only) — Cortex with gemma4:e4b, handles classification, summarisation, fast small-context work.
  • Laptop — interactive agents, orchestration, the developer's primary workspace.

All three see the same shared memory in Consciousness Server. machines-server tells any agent which of them has headroom right now. Tasks flow to whichever node makes sense.

Where this goes — exploratory directions

The paradigm generalises: anything with a controller, a state, and telemetry is a machine. None of the scenarios below ship today; they describe what the architecture enables once the right adapters exist. We mark them Planned or Exploratory honestly — flagged here so contributors and customers can shape the roadmap together.

Prototype shop / small factory

3D printers, plotters, CNC mills, laser cutters. An agent picks up a print queue, dispatches files to printers with the right material loaded, monitors job state, reports failures. Mixed deterministic execution (g-code goes to the printer verbatim) with non-deterministic decisions (which file first, which printer, what to do on a jam). Planned.

Biology / clinical lab

Incubators, cytometers, sequencers. An agent reads experiment metadata, schedules samples, cross-references results against shared memory in Consciousness Server ("how did the same condition behave in last week's run?"). Lab notebooks become queryable corpus. Planned.

Energy infrastructure

PV panels, battery storage, EV chargers, server-room load. An agent balances when to charge the car, when to power servers, when to sell surplus to the grid. All locally, with no third-party cloud knowing your home-energy schedule. Exploratory.

Building / smart-home enterprise

HVAC, climate sensors, blinds, lighting. Agent learns occupancy patterns, suggests optimisations, never phones home to a vendor cloud. Exploratory.

Drones — non-kinetic uses

Light-show choreography, infrastructure inspection (your own pipelines, towers, roofs), precision-agriculture monitoring. Each drone is a machine with telemetry; agents plan routes, react to wind, optimise battery rotation. Kinetic / weapons applications are out of scope by licence — see ethics. Exploratory.

Private fleet / heavy equipment

Trucks, combines, excavators on a single owner's site. Telemetry, diagnostics, predictive maintenance — keyed to the owner's infrastructure rather than a manufacturer's cloud. Exploratory.

Important framing: all of the above are what the architecture enables, not what ships today. We never claim "we already control drones" or "we run a 3D-printing farm". We claim "an agent can know that machine A has GPU capacity and machine B is offline" — that's Ships. The rest is the design space opened by treating machines as first-class.

Red lines — what we won't enable

Written into our commercial licence, not just marketing:

  • Weapon systems — kinetic, autonomous targeting, AI-guided ordnance. No.
  • Mass surveillance — city-scale watching, face recognition at scale, social scoring, behavioural prediction for political repression. No.
  • Autonomous critical infrastructure without humans in the loop — power grids, traffic control, clinical decisions taken without medical oversight. No.

If your use case conflicts with these lines, BuildOnAI is not the right tool. The licence is enforced as a contractual matter, not as a marketing slogan.

Next steps