Whatever message this page gives is out now! Go check it out!

AI Services monitoring in ColdFusion Performance Monitoring Toolset

Last update:
May 18, 2026
Monitor the performance of AI-powered features in your ColdFusion application, including agents, LLMs, vector stores, RAG pipelines, and MCP connections.
The AI Services area of the Performance Monitoring Toolset gives you visibility into how your application uses AI infrastructure. It organizes monitoring data across six tabs, each focused on a different part of the AI stack. Use it to spot slow operations, compare service performance, and trace individual requests from entry point to model call.
To get started, see monitor AI services. For column definitions and widget details, see the tab and widget reference.

What each tab shows

The AI Services page has six tabs. Each tab focuses on a layer of the AI stack:
  • Agents: Agent-level traffic: response time, throughput, slowest requests, most frequent agents, URL performance, and status and operation distributions.
  • LLMs: Model-level behavior: latency and throughput per model, provider and model distributions, token usage by provider, and slowest or most frequent model calls.
  • Vector Stores: Vector query and embedding performance: response time and throughput per vector store, slowest queries, embedding call latency, and provider and status distributions. The tab has two on-screen sections: Vector Queries and Embeddings.
  • RAG — Retrieval-augmented generation quality and throughput: retrieval pipeline latency, retrieval quality scores, ingestion pipeline phase durations, completed and in-progress ingest operations, and ingest error breakdowns. The RAG tab does not honor the global Select AI Services dropdown; it always aggregates across all RAG-enabled services.
  • MCP Clients: Outbound MCP usage: response time and throughput per server, slowest and most frequent operations, error distribution by tool name, and operation type distribution.
  • MCP Servers: Inbound MCP usage: server-side latency and throughput, slowest and most frequent operations handled, error and status distributions. Trace links are not available from this tab; use MCP Clients to trace a call.

How the global controls work

Two controls at the top of the page apply to every chart, table, and pie on the current tab (except the RAG tab, which aggregates across all services):
  • Time-range picker: Preset buttons (1M = 1 minute, 1H = 1 hour, 1D = 1 day, 1W = 1 week, 1Mo = 1 month) or a custom range via the calendar icon next to 1Mo. The selected range scopes all time-series charts and most summary tables.
  • Select AI Services (max 5) : A multi-select dropdown that filters data to up to five services. The dropdown has no "All" option. Your selection is saved in the browser and reused the next time you open the page.

How Trace Details works

When you select View Trace from a table row on the Agents, LLMs, Vector Stores, or RAG tabs, a Trace Details modal overlay opens on top of the AI Services page. It shows where time is spent within a single request.
The modal has three tabs:
  • Timeline: A horizontal execution timeline that shows spans as bars on a time axis. Each bar is labeled with a type (Agent, LLM, Embedding, Vector, Tool, Guardrail, RAG Phase, or MCP Server), a detail name, and the duration in milliseconds. Hover or click a bar to see span-specific details in a tooltip.
  • Basic Info: Read-only fields for the trace: Operation, Duration, Total Tokens, Retriever Configured, Has Tools, and Status. Two additional fields appear conditionally: Memory Type (when configured) and Error (when the status is an error). Non-agent traces show a different field set: Provider, Duration, Model, Template Path, Line No, Total Tokens, Status, Error.
  • All Spans: A sortable table of every span in the trace with columns: Type, Service / Provider, Operation, Name, Duration (ms), and Status.
Close the modal to return to the tab you were on. For complete column definitions, see the tab and widget reference.

Best practices

  • Compare the Average Response Time chart with the Throughput chart at the same timestamps. Spikes in both typically indicate load; latency-only spikes suggest slow dependencies or outliers.
  • Sort grid tables by Duration descending to find the worst offenders first.
  • Use legend entries to toggle individual series on or off within a chart. The toggle affects only that chart, not other widgets on the page.
  • Check the Status Distribution pie to confirm whether errors are driving slowness or whether successful requests are simply slow.

Share this page

Was this page helpful?
We're glad. Tell us how this page helped.
We're sorry. Can you tell us what didn't work for you?
Thank you for your feedback. Your response will help improve this page.

On this page