fbpx

Search Blog Articles & Latest News

Blog Archive Resource Library

Get practical insights on AI, Agentic Systems & Digital Twins for industrial operations

Join The Newsletter

Scaling Data-Intensive Pages: Why page capacity collapses as data sources increase, and how distributed execution changes the curve

 

Part 3 of a series on scaling data-intensive applications

Coming in 2026: The distributed connector architecture described in this series is planned for an upcoming XMPro release. This series shares our engineering findings and the architectural approach we’re taking.


Adding a second data grid to a page shouldn’t halve your capacity. But under certain architectures, it does worse than that.

In Parts 1 and 2, we examined the distributed connector pattern and measured its impact on single-data-source pages. The results showed a ~5× improvement in concurrent user capacity. Those numbers, however, represented the best case for in-process connectors—a page with just one data source.

Real-world pages are rarely so simple. Dashboards display multiple data grids. Detail views combine forms with related data lists. Monitoring screens aggregate data from several sources. Each additional data source multiplies the connector loading overhead we identified earlier.

This post explores what happens as data source count increases—and why the distributed pattern’s advantages compound rather than diminish.

The Compounding Effect

Recall the per-request overhead in the in-process model:

  1. Fetch connector assembly from database
  2. Load assembly into memory
  3. Instantiate connector
  4. Execute query
  5. Return results

For a page with one data source, this cycle runs once. For a page with three data sources, it runs three times—often in parallel, competing for the same database connections, memory allocation, and CPU cycles.

The question we wanted to answer: does the overhead scale linearly, or does it compound?

Test Design: Varying Data Source Count

We created four page variants:

Page Type Data Sources Description
Baseline 0 Static page, no connector calls
Light 1 Single data grid
Medium 2 Two data grids
Heavy 3 Three data grids

Each page was tested under the same conditions: staged ramp-up, threshold detection, identical infrastructure. The only variable was how many data sources the page contained.

Test Context: We deliberately used constrained infrastructure to surface architectural differences. The improvement ratios are consistent across infrastructure sizes, but absolute capacity depends on your specific deployment.

Baseline: Proving the Control

Before measuring connector impact, we needed to establish that both architectures perform equivalently when connectors aren’t involved.

0 Data Sources (Static Page):

Both architectures performed effectively identically—high throughput, 100% success rate, similar response times. Both handled the full ramp-up without threshold breach.

This confirms an important point: the application server and page rendering pipeline are not the bottleneck. When connector loading is removed from the equation, both architectures perform the same. The differences we observe with data sources are purely attributable to connector architecture.

One Data Source: The Gap Emerges

With a single data source, the differences emerge:

Metric Distributed Advantage
Page loads ~175× more
Response time ~2× faster
Success rate Higher reliability

The in-process model breached the threshold almost immediately, completing relatively few page loads before the test ended. The distributed model maintained consistent throughput.

Two Data Sources: The Gap Widens

Adding a second data source:

Metric Distributed Advantage
Page loads ~600× more
Response time Faster
Success rate Higher reliability

The in-process architecture completed very few page loads before threshold breach. The distributed architecture maintained near-perfect reliability with hundreds of times more throughput.

Three Data Sources: Orders of Magnitude

The three-data-source page represents a typical dashboard: multiple data grids showing related information.

Metric Distributed Advantage
Page loads ~1000×+ more
Response time Significantly faster
Success rate Higher reliability

The in-process model struggled significantly before threshold breach. The distributed architecture maintained consistent throughput with responsive page load times. The contrast couldn’t be starker.

Visualising the Scaling Curve

Plotting distributed advantage against data source count reveals the divergence:

Data Sources Distributed Advantage
0 (baseline) ~1× (equal)
1 ~175×
2 ~600×
3 ~1000×+

The in-process curve collapses as data sources increase. This isn’t linear degradation; it’s superlinear. Each additional data source doesn’t just add overhead, it multiplies existing bottlenecks. Connection pool contention, memory pressure, and CPU scheduling conflicts compound.

The distributed curve degrades gracefully. Each additional data source adds load to the stream host collection, but the application server’s work remains constant: publish messages, await responses. The bottleneck doesn’t compound because the application server isn’t doing the heavy lifting.

Why the Compounding Occurs

In-Process: Resource Contention

When multiple connectors load simultaneously in-process:

  1. Database connections contend — Multiple threads attempt to fetch connector binaries from the same database connection pool
  2. Memory allocation conflicts — Each connector assembly requires memory; parallel loading creates allocation spikes
  3. CPU scheduling degrades — Assembly loading and instantiation are CPU-intensive; multiple concurrent loads fight for processor time
  4. Cascading delays — A slow connector load delays page rendering, which delays request completion, which delays connection release, which delays the next connector load

Each data source adds not just its own overhead, but interference with other data sources’ overhead.

Distributed: Isolation

The distributed model avoids compounding:

  1. Application server work is minimal — Publishing multiple MQTT messages takes marginally longer than publishing one
  2. Stream hosts process independently — Each connector query runs on potentially different hosts, avoiding local contention
  3. No assembly loading per-request — Data stream agents are pre-loaded on stream hosts, eliminating the per-request connector instantiation overhead
  4. Parallel execution — Multiple data sources can resolve truly in parallel across the stream host collection

The architectural separation transforms multiplicative overhead into additive overhead.

Real-World Implications

Dashboard Design Freedom

Many organisations restrict dashboard complexity to maintain performance. “No more than two data sources per page” becomes an architectural constraint disguised as a design guideline.

The distributed model removes this constraint. Designers can build the dashboards users need, not the dashboards the architecture permits.

Capacity Projection

Consider planning for substantial concurrent users loading a multi-data-source dashboard:

In-Process:

  • Limited capacity before threshold breach with complex pages
  • Supporting high concurrency would require significant infrastructure scaling
  • Each scale unit adds its own connector loading overhead

Distributed:

  • High throughput achieved on modest infrastructure
  • Substantial concurrent users well within capacity
  • Scaling focuses on stream hosts if query volume increases

The distributed model makes capacity planning predictable. The in-process model makes it a negotiation between page complexity and infrastructure cost.

Query Complexity Independence

We held query complexity constant across tests—simple queries returning small payloads. But the principle extends: complex queries that take longer to execute don’t change the connector loading overhead in the in-process model. A slow query still incurs the same assembly fetch and instantiation overhead.

In the distributed model, slow queries affect only the stream host collection. The application server’s timeout behaviour is independent of query complexity—it simply awaits message responses.

The Architectural Lesson

The data source scaling tests illuminate a general principle: bottlenecks in shared-nothing architectures grow linearly; bottlenecks in shared-resource architectures grow superlinearly.

The in-process model shares everything: database connections, memory, CPU, thread pools. Adding load to any component affects all components. The distributed model shares little: the application server does minimal work; the stream host collection handles heavy lifting; the message broker mediates without blocking.

This principle applies beyond connector architecture. Anywhere you observe compounding degradation under load, look for shared resources. Architectural solutions often involve separating concerns not just logically, but physically—giving each workload type its own resource pool.

Next in This Series

So far, we’ve examined run mode—pages serving end users. But what about the developers building those pages? Design mode has its own scaling characteristics, and its own surprising results. Part 4 explores the developer experience at scale.


This is part 3 of a series on distributed architecture patterns. The series draws on load testing conducted on XMPro Application Designer, comparing in-process SQL connectors with distributed stream-based connectors. The distributed connector capability is planned for an upcoming 2026 release.