Context Graphs in Industrial Operations: Different Stakes, Different Architecture

Pieter van Schalkwyk · February 16, 2026 · 15 min read

Pieter van Schalkwyk — CEO, XMPro

This article originally appeared on XMPro CEO's Linkedin Blog, The Digital Engineer

In a follow-up to all the responses to "AI's trillion-dollar opportunity: Context graphs", Jaya Gupta and Animesh Koratana published "How do you build a context graph?" They described the infrastructure challenge well. But their examples are enterprise software: CRM systems, code deployments, support tickets. The stakes there are margin and efficiency.

Industrial operations is different. Get the reasoning wrong on a maintenance deferral and you lose a $5 million compressor. Miss a pattern in process conditions and you have an environmental release. The cost of failure isn't a bad quarter. It's equipment destruction, safety incidents, regulatory action.

This changes what a context graph must do.

We call this Agentic Operations: autonomous agents operating industrial assets, with humans providing oversight rather than making every decision. The DecisionGraph is the infrastructure that makes Agentic Operations trustworthy enough to deploy.

Jaya and Animesh argue that context graphs become "world models" that enable simulation. They're right. But for industrial operations, simulation isn't enough. The simulation must be trustworthy enough that organizations will let agents act on it. That's a different problem.

The central insight from our production deployments: the same infrastructure that makes agents intelligent must make their reasoning transparent to humans. Not as separate capabilities. As the same system serving both purposes. This dual requirement shapes every architectural choice.

The Trust Problem Enterprise Doesn't Face

Animesh describes a flywheel: agents solve valuable problems, their trajectories become decision traces, traces accumulate into context graphs, better context makes agents more capable. Deploy more agents, generate more traces, compound intelligence.

This flywheel assumes you can deploy agents. In industrial operations, that's the hard part.

I see it constantly in customer conversations. Operations teams watch an AI demo, see impressive diagnostics, then ask: "How do I know it won't make things worse?" They've spent careers learning that complex systems fail in unexpected ways. They don't trust black boxes. Shouldn't trust black boxes.

Enterprise software can iterate quickly. Deploy an agent, see what happens, roll back if needed. Industrial operations can't. You don't "roll back" a compressor failure. You don't iterate on a safety incident.

For industrial operations, the flywheel runs in reverse. Build the DecisionGraph first. Earn trust through transparency. Then deploy agents. Generate more traces. Strengthen trust. Expand agent autonomy.

    "><img id="ember75" class="ivm-view-attr__img--centered  reader-image-block__img evi-image lazy-image ember-view" src="https://media.licdn.com/dms/image/v2/D5612AQGy1YKYdFeVyw/article-inline_image-shrink_1000_1488/B56ZtpamF5KEAQ-/0/1767000139371?e=1772668800&amp;v=beta&amp;t=iZYbSJWpYLGVsaQ384yK3uvkbDSt0KfzfLXce6pygDI" alt="Article content" /></div>

Enterprise vs Industrial "Flywheels"

The question isn't "how do you build a context graph?" The question is "how do you build a context graph that earns enough trust to let agents operate?"

What Trust Requires: The DecisionGraph

I initially called our implementation the BrainGraph, but given the decision traces approach, I think DecisionGraph is better. The name matters. It's not a knowledge graph (static relationships). It's not a context graph (accumulated context). It's a graph of decisions with complete reasoning chains.

Consider a bearing replacement decision. The DecisionGraph captures:

Vibration patterns that triggered evaluation (not just "high vibration" but the specific signature, rate of change, correlation with temperature)
Similar failures on related equipment in the past 18 months
Current production schedule constraints (planned shutdown in 10 days vs. unplanned now)
Parts availability (bearing in stock, contractor available Thursday)
The agent's reasoning for recommending early replacement, validated by the maintenance supervisor
The outcome: failure prevented, inspection confirmed bearing was 3 days from failure

Six months later, similar vibration pattern appears on a different pump. The agent detects it, queries the DecisionGraph, retrieves the precedent with full context, and recommends early replacement. This is work that would take a human hours of investigation: pulling historical records, finding similar cases, checking parts availability, reviewing production schedules. The agent handles it. The supervisor can review the reasoning and override if needed, but they don't have to do the investigation themselves. And the agent is doing this continuously across hundreds of assets, not waiting for someone to notice a problem.

This structure serves both masters:

Agent Intelligence: The agent retrieves precedents automatically, weighs them against current conditions, and makes informed decisions.

Human Oversight: Operators can trace any decision back through its complete reasoning chain, validate patterns, and override when needed.

The same data. The same queries. Different purposes.

This dual-purpose design isn't elegant architecture for its own sake. It's the only way to get industrial organizations to let agents operate autonomously. They need to see the reasoning. They need to validate the patterns. They need to explain decisions to regulators. But they don't need to make every decision themselves.

How Trust Reshapes the Two Clocks Problem

Jaya and Animesh identify the two clocks problem: we've built infrastructure for state (what's true now) but almost nothing for reasoning (why it became true).

Their solution: capture reasoning by being in the execution path where decisions happen.

For industrial operations, this is necessary but not sufficient. You also need the reasoning to be queryable by humans when they need to oversee, audit, or improve agent behavior.

Consider what industrial systems actually capture today:

    "><img id="ember92" class="ivm-view-attr__img--centered  reader-image-block__img evi-image lazy-image ember-view" src="https://media.licdn.com/dms/image/v2/D5612AQF2GxfMdSX3Pw/article-inline_image-shrink_1000_1488/B56ZtpeY3NLAAQ-/0/1767001133104?e=1772668800&amp;v=beta&amp;t=0oufw_smKB2DRoceRNhLVANP1qe_GH8oIopskkYQOgY" alt="Article content" /></div>

What we do today

Historian: Records that setpoint changed from 340 to 355 at 14:32:07. Missing: who approved it, why, what would trigger reverting.

CMMS: Records bearing replaced, 4 labor hours, parts list. Missing: diagnostic reasoning, alternatives considered, tradeoffs accepted.

Control System: Records sequence of operator actions. Missing: why those actions, what alternatives were rejected.

The decision trace layer must sit where agent reasoning actually occurs. But it must also produce traces that humans can query when they need to understand, audit, or improve.

How Trust Reshapes the Informed Walker Problem

Animesh's insight about agents as "informed walkers" is right: you can't predefine organizational ontology. It emerges from agent trajectories through problem-solving.

But for industrial operations, emergence isn't enough. The emerged structure must make physical sense.

When an agent investigates an equipment issue, its trajectory reveals what matters:

Which data sources get queried together
Which entities co-occur in decision chains
What relationships exist between equipment, processes, and people

Accumulate thousands of trajectories and the ontology emerges. Entities that appear repeatedly are entities that matter. Relationships traversed frequently are relationships that are real.

Here's what Animesh's enterprise framing misses: in industrial operations, humans must be able to validate that the emerged ontology respects physics.

Agents might learn correlations that aren't causal. They might find patterns that worked historically but violate engineering principles. They might miss constraints that experienced operators carry in their heads.

The DecisionGraph must support validation queries:

"Show me why these two entities are linked" (with the decision traces that established the relationship)
"What evidence supports this pattern?" (with the outcomes that validated it)
"Has this relationship ever led to a bad outcome?" (with the failures that might invalidate it)

Multi-agent consensus helps here. When maintenance, production, and safety agents coordinate on a decision, they negotiate shared understanding. They have a shared decision space. Where their models conflict, they surface disagreements. The consensus process itself becomes a decision trace, capturing how different perspectives were reconciled.

Engineers can review emerged patterns against their domain knowledge. When the DecisionGraph learns something that contradicts physics or experience, humans can flag it, investigate it, correct it. Not because humans make better decisions in the moment, but because human oversight improves the system over time.

Schema as output works. But for industrial operations, schema as output must include human validation loops.

How Trust Reshapes the World Model Problem

Jaya and Animesh's strongest claim: "Simulation is the test of understanding. If your context graph can't answer 'what if,' it's just a search index."

They're right about simulation. Wrong about the test.

Simulation is essential. But for industrial operations, the test goes further. It's not whether you can simulate. It's whether the simulation is trustworthy enough to let agents act on it.

Animesh's PlayerZero simulates code deployments: "Given this change, will it break something?" If the simulation is wrong, you have a production incident. You roll back. You fix it.

Industrial operations simulates maintenance decisions: "If we defer this repair, what's the failure probability?" If the simulation is wrong, you might not get a chance to fix it.

This changes what the world model must provide:

    "><img id="ember117" class="ivm-view-attr__img--centered  reader-image-block__img evi-image lazy-image ember-view" src="https://media.licdn.com/dms/image/v2/D5612AQE-pbor3qCjFw/article-inline_image-shrink_1000_1488/B56Ztpk4ZVJcAQ-/0/1767002835145?e=1772668800&amp;v=beta&amp;t=dvi_63WYdXRNam2tkFkdajwXAycv-9TKgkO7jwkRb1k" alt="Article content" /></div>

World Model Capabilities

Precedent Search: Enterprise needs to find similar situations. Industrial needs to find similar situations with outcome validation.

Pattern Recognition: Enterprise needs to identify what worked. Industrial needs to identify what worked and why it worked (causal reasoning).

Counterfactual Reasoning: Enterprise asks "What if we take this action?" Industrial asks "What if we take this action, given these physical constraints and failure modes?"

Confidence Bounds: Optional for enterprise. Required for industrial (agents and humans both need to know uncertainty).

Return to the bearing example. When similar vibration appears on a different pump, the agent:

Retrieves precedent: "What happened last time we saw this pattern?" (The March failure on Unit 3, the early replacement that prevented failure)
Evaluates counterfactual: "What would happen if we deferred action for 48 hours?" (Based on progression rate in similar cases, estimated 60% probability of failure within that window)
Assesses uncertainty: "How confident is this prediction?" (12 similar cases in the DecisionGraph, 10 progressed to failure within 5 days, 2 stabilized, confidence is moderate)

This is the investigation work. Pulling records, finding analogous cases, assessing risk. The agent does it continuously across the asset base. Humans couldn't monitor everything this way. They'd only investigate after something failed or an alarm fired.

The DecisionGraph also surfaces uncertainty. Where the precedent base is thin, it says so. Where outcomes varied despite similar reasoning, it shows the variance. Where physical constraints might invalidate historical patterns, it flags the risk.

Agents don't need to be always confident. They need to know what they don't know, and communicate that clearly so humans can provide appropriate oversight.

Architectural Choices Driven by Trust

Three architectural choices follow from the trust requirement. (These are part of a broader safety architecture we're developing called the Industrial Agent Manifesto, but that's a topic for another article.)

1. Separation of Control

Agents can observe, reflect, plan, and decide. But execution goes through a separate control layer.

The DecisionGraph captures agent reasoning. A separate control layer (in our case, XMPro DataStreams) determines what actually happens. This separation isn't about limiting AI capability. It's about creating structural trust.

When an operator asks "how do I know the agent won't make things worse?", the answer is architectural: the agent decides, but the control system enforces constraints. Dangerous actions are blocked regardless of what the agent recommends.

This separation also means every action has an audit trail. The DecisionGraph shows the reasoning. The control system shows the constraints that were applied. Together, they provide complete traceability.

2. Standards-Based Provenance

Trust requires that others can verify your reasoning. Not just your team. Regulators, auditors, new employees, acquiring companies. If they can't query decision traces without your help, they can't independently verify that the system is trustworthy.

This is why the DecisionGraph must be architected around open standards from the start, not retrofitted later. Industrial operations have specific requirements that make this non-negotiable:

Regulatory scrutiny: When a regulator investigates an incident, they need to trace the reasoning chain independently. Proprietary formats that require vendor tools to interpret won't pass muster.

Asset lifecycle: Industrial equipment operates for 20-40 years. Decision traces from 2025 need to be queryable in 2055. Open standards (like W3C's PROV-O for provenance, SHACL for policies, OWL for relationships) are designed to outlast any vendor.

Workforce turnover: The engineer who reviews a decision trace in five years won't be the one who built the system. Standards-based formats are documented, teachable, and don't require tribal knowledge to interpret.

M&A portability: When facilities are acquired or divested, decision intelligence needs to transfer with them. Proprietary formats create lock-in that destroys value.

The choice of standards is an architectural decision that's hard to change later. Build for auditability from day one.

3. Progressive Autonomy with Continuous Oversight

Enterprise context graphs optimize for full autonomous operation in the future. Industrial context graphs optimize for progressive autonomy where humans improve the system over time.

The DecisionGraph supports multiple operating modes:

Advisory: Agent recommends, human approves, decision trace captures both
Supervised: Agent acts within bounds, human monitors, decision trace captures oversight
Autonomous: Agent acts, human reviews periodically, decision trace enables audit

Each mode generates decision traces. Each trace is queryable. The system learns from human overrides (what did they see that the agent missed?) and from human approvals (what reasoning patterns are working?).

Over time, the boundary shifts. Decisions that once required human approval become fully autonomous as the DecisionGraph accumulates evidence that the reasoning works. The goal is more autonomy, not less. But observability never goes away. Humans can always query how any decision was made, and that oversight continuously improves agent performance.

What This Means for Industrial Organizations

If you're evaluating AI for industrial operations, ask different questions than you would for enterprise software.

Don't ask: "Can it learn and improve?" Ask: "Can we see how it learned and validate that the learning makes physical sense?"

Don't ask: "Can it simulate outcomes?" Ask: "Can we trace the simulation back to validated precedents and understand the uncertainty?"

Don't ask: "How autonomous can it be?" Ask: "How does it earn the right to more autonomy, and what oversight do we have at each stage?"

Don't ask: "What's the ROI?" Ask: "What's the path to trust, and how quickly can we expand agent autonomy as trust is established?"

The companies that get this right won't just have better AI agents. They'll have organizational intelligence that compounds over time, with full observability into how that intelligence developed, and progressive autonomy that lets agents do more as they prove themselves.

The context graph isn't the exhaust from agent work. It's the foundation that makes agent autonomy trustworthy enough to deploy.

Related articles:

AI's trillion-dollar opportunity: Context graphs by Jaya Gupta and Ashu Garg
How do you build a context graph? by Jaya Gupta and Animesh Koratana
Decision Traces for Agentic Operations: Why Agents Need Operational Memory (my initial response)

Pieter van Schalkwyk is the CEO of XMPro, specializing in industrial AI agent orchestration and governance. XMPro MAGS with APEX provides cognitive architecture and DecisionGraph capabilities for agent networks operating on existing industrial systems.

Our GitHub Repo has more technical information. You can also contact myself or Gavin Green for more information.

Read more on MAGS at The Digital Engineer

The Trust Problem Enterprise Doesn't Face

What Trust Requires: The DecisionGraph

How Trust Reshapes the Two Clocks Problem

How Trust Reshapes the Informed Walker Problem

How Trust Reshapes the World Model Problem

Architectural Choices Driven by Trust

1. Separation of Control

2. Standards-Based Provenance

3. Progressive Autonomy with Continuous Oversight

What This Means for Industrial Organizations

Stay ahead of the curve

Related Posts

An Ontology Is Not a Graph Database

Who Owns Your Context Layer?

288 Years of Research Behind MAGS