fbpx

Search Blog Articles & Latest News

Blog Archive Resource Library

Get practical insights on AI, Agentic Systems & Digital Twins for industrial operations

Join The Newsletter

Proving Industrial Agentic AI Teams — Live From the Codebase

Proving Industrial Agentic AI Teams — Live From the Codebase

Most vendors claim Agentic AI. Few will let you interrogate it in public. In this recorded session, XMPro CEO Pieter van Schalkwyk and VP Strategic Solutions Gavin Green open our production Multi-Agent Generative System MAGS to direct questioning—no slides, no scripts, just answers straight from the codebase.

Full FAQ with detailed Q&A: https://github.com/XMPro/MAGS-Webinar-FAQ


What MAGS Is (and Why It’s Built This Way)

Primary purpose. MAGS is a multi-agent system for industrial and operational intelligence. Agents observe live signals, reason over context, plan, and collaborate—continuously. It integrates natively with XMPro DataStreams for real-time decisions at the edge or on-prem.

Core architecture. We follow a single-agent-per-instance pattern: each agent runs as its own self-contained process with its own memory, planning, and tools. Agents don’t “coordinate in code”; they coordinate via message brokers (MQTT/AMQP/Kafka), which keeps deployments modular, resilient, and easy to scale.

Main components (at a glance):

  • Agent Core (AgentInstance) – profile, tools, planning cadence, team membership.

  • Memory Cycle – Observe → Reflect → Plan → Communicate/Consensus.

  • Planning System – PDDL-backed plans, confidence scoring, objective alignment.

  • Communication – broker abstraction for registration, routing, and status.

  • Consensus – multi-round collaborative iteration with conflict detection and human escalation.

  • Tooling – DataStreams, SQL, vector/graph ops, MCP client, etc.

  • Data Layer – vector + graph + relational stores; significance and decay; embedding cache.

  • Observability – OpenTelemetry for traces/metrics/logs.


The Cognitive Loop: Observe, Reflect, Plan, Act

  • Observe – Agents ingest events from DataStreams, systems, and peers; normalize to memories with importance, surprise, confidence, and trust.

  • Reflect – When significance crosses a threshold, agents retrieve related memories (vector + graph), look for patterns, and record higher-level insights.

  • Plan – Triggered by cadence or context shifts; generates PDDL plans with tasks, resources, dependencies, confidence, and expected KPI impact.

  • Act – Execute via tools (e.g., DataStreams, SQL, Vector/Graph). Track progress, adapt plans, or trigger consensus if coordination is required.

This loop runs continuously and asynchronously; critical observations can interrupt planning, and older memories decay so the system stays current.


Reliability: Failure Handling, QA, and Audit

Failure handling. A global exception handler standardizes error events (e.g., XMAGS/EVT/ERROR/{teamId}/{agentId}), while AgentStatus emits health and resource metrics. The broker layer validates messages, de-dupes, and reconnects. Agents dispose resources cleanly, publish startup/shutdown, and resume. Teams degrade gracefully.

Quality assurance. Every decision carries a confidence score (reasoning, evidence, consistency, stability, objective alignment). Confidence gates drive autonomy vs escalation; safety-critical actions demand higher bars and/or consensus. Actions define preconditions, reversibility, and rollback. Human-in-the-loop is configurable from “monitor” to “approve.”

Consensus auditing. Consensus processes are fully traced with OpenTelemetry (process/round IDs, agent IDs), structured logs, MQTT progress messages, and a graph database audit trail (rounds, conflicts, plan versions, votes/status). You can reconstruct any decision path for root-cause analysis.


Explicit Values, Explicit Limits

Value system. Agents optimize objective functions (weighted, min/max components like safety, efficiency, quality) and honor responsibilities and constraints. Confidence scoring acts as an ethical “are we sure?” layer; team objectives outrank individual ones.

Deontic governance. Formal permission tokens (must, may, must-not) enforce what agents are allowed to do. Deontic = hard guardrails; objectives = what’s good; confidence = certainty. Together they create safe autonomy.

Synthetic memories. Expert-validated memories (with trust factors, triggers, expiry) inject proven playbooks for rare or high-risk scenarios—raising evidence/consistency and making behavior more predictable.


Architecture vs. Popular Frameworks

  • MAGS – Autonomous cognitive agents, long-term memory with significance/decay, PDDL plans, consensus, industrial connectors, production telemetry, on-prem first.

  • CrewAI – Task-oriented crews for knowledge work; good for role-based workflows, less about persistent autonomy.

  • LangChain – Developer toolkit; powerful but you implement agents, memory, planning, and governance yourself.

If you need continuous, governed autonomy over industrial systems, MAGS fills that gap.


On-Prem? Absolutely.

MAGS runs fully on-prem with local LLMs (Ollama/vLLM/TGI), local vector/graph/SQL stores, and local brokers (MQTT/RabbitMQ/Kafka). It’s designed for air-gapped networks, low latency, and data sovereignty. Observability uses OpenTelemetry, Prometheus/Grafana, or your existing APM. XMPro DataStreams also runs on-prem, so you can keep plant data inside your perimeter.


RAG & GraphRAG Behind the Scenes

RAG everywhere. Agents use a vector store for retrieval in conversations, content analysis, reflection, and planning. Collections can be general or agent-specific, with token budgets and citations managed centrally.

GraphRAG for structure. We pair vectors with a graph database (e.g., Neo4j) so retrieval respects relationships (assets → processes → people → locations). That enables multi-hop reasoning, causal chains, temporal patterns, explainable evidence paths, and better cross-agent knowledge sharing. It’s especially powerful for asset hierarchies, workflows, failure modes, and impact analysis.


Arbitration When Agents Disagree (e.g., Predictive vs Throughput)

When objectives diverge under an anomaly, ConsensusManager triggers collaborative iteration:

  1. Detect conflicts (resource, objective, timeline, dependency).

  2. Issue conflict reports; agents adjust plans.

  3. Iterate to a hybrid, time-sliced, or resource-partitioned resolution—or escalate:

    • Voting (if enabled),

    • Human intervention, or

    • Best-plan selection by objective score and confidence.

All of it is audited: traces, logs, plan diffs, outcomes, and post-hoc performance to improve future arbitration.


Model Strategy & Prompting

MAGS is multi-provider (OpenAI/Azure, Anthropic, Google, AWS Bedrock, Cohere, Meta, Mistral, xAI, Hugging Face, OpenRouter, Ollama, more). Each agent’s profile specifies provider, model, token limits, and sampling params. Prompts live in the database with versioning and caching—spanning conversation, reflection, plan generation (PDDL), tool use, content analysis, and more. Context assembly blends RAG, conversation, agent/team memories, and objective functions so prompts are role- and situation-aware. We monitor token usage, latency, and quality; support fallbacks; and tune for domain constraints (safety/compliance terminology, asset IDs, metrics).


Practical Spotlight: ERP, Quality & Tact Time

With XMPro DataStreams, agents connect to SAP/Oracle/Dynamics (and others) to:

  • Track material movements and batch lifecycle in real time,

  • Calculate rejection rates (batch/time/material/work-center),

  • Generate production summaries (planned vs actual, downtime, OEE proxies),

  • Report tact time (theoretical vs actual) with efficiency deltas.

Recommendations carry confidence scores. High-confidence changes can auto-apply with an audit trail; medium confidence queues for review, improving via human feedback.


Why This Matters to Industrial Leaders

You need more than a chatbot. You need governed autonomy: explicit objectives, enforceable permissions, measurable confidence, consensus with audit, safe rollback, and full observability—running on your infrastructure.

How to start:

  1. Choose one or two high-volume, bounded decisions (quality triage, material exceptions, schedule nudges).

  2. Run high oversight, measure outcomes, and tune thresholds.

  3. Expand to more decisions as confidence grows.


Watch the Recording & Read the FAQ

Bring your toughest questions. We’ll answer from the codebase—not from slides.