Whatever message this page gives is out now! Go check it out!

Simple RAG

Last update:
May 18, 2026
ColdFusion simpleRAG() reference. Covers ask() vs chat(), messageWindowChatMemory, getStatistics(), ingestAsync() with Java Futures, chunkSize, chunkOverlap, splitterType, minScore, maxResults, and default pipeline behavior.
Simple RAG is ColdFusion's high-level API for building retrieval-augmented generation applications. It is designed to be operational with the absolute minimum configuration. You need two things: where your documents are (source) and which AI model to use (model). The platform handles document loading, splitting, embedding generation, storage, and retrieval automatically.

Your first RAG application

Create a chat model, a vector store, and pass them together with your documents to simpleRAG(). It handles loading, splitting, embedding, storing, and retrieval automatically. The following example creates a simpleRAG service, ingests documents from a local folder, and answers a question:
<cfscript>
    chatModel = chatModel({
        provider: "openai",
        modelName: "gpt-4o-mini",
        apiKey: application.apikey,
        temperature: 0.4
    });

    docsDir = expandPath("./docs/");
       
    // Create RAG service with the absolute docs path
    ragBot = simpleRAG(docsDir, chatModel,
    {
        minscore: 0.7,
        maxResults: 4
    });

    // Ingest documents from the docs folder
    ragBot.ingest();

    // Ask a question
    answer = ragBot.ask("How to extend Adobe subscription using prepaid card?");
    writeDump(answer.message);
</cfscript>
Output
To extend your Adobe subscription using a prepaid card, follow these steps: 1. On the prepaid card, locate your redemption codeph beneath the scratch-off foil on the back of the card. 2. Select the appropriate link based on your product type, such as Creative Cloud, Acrobat, or any specific version you have. 3. Sign in using your Adobe ID and password. 4. Follow the onscreen instructions to complete the redemption process. This will add the subscription period to your existing account.
The source parameter accepts a single file path, a folder path (all supported files are indexed recursively), a URL, or an array of any combination:
  • expandPath('./docs/handbook.pdf') — single file path
  • expandPath('./knowledgebase/') — folder path (all supported files are indexed recursively)
  • 'https://example.com/product-manual.html' — a URL
  • [expandPath('./docs/'), 'https://example.com/faq.html'] — an array of any combination
The model parameter accepts either a ChatModel object (from ChatModel()) or an inline struct with type, apiKey, and modelName keys.

SimpleRAG with parameters

Here is a quick guide to the values your snippet passes into the chat model, vector store, embedding setup, and simpleRAG.
Parameters
  • ChatModel
    • provider: LLM vendor (e.g. OpenAI).
    • modelName: Which chat model generates answers.
    • apiKey: API authentication.
    • temperature: How random vs focused replies are.
  • expandPath("./docs/"): Resolves the docs folder to a real server path; this value is the first argument to simpleRAG (the document source).
  • VectorStore
    • provider: Where vectors are stored (e.g. inMemory = in process, not durable across restarts).
  • embeddingModel (struct in the third simpleRAG argument)
    • provider: Embedding API vendor.
    • modelName: Which embedding model creates vectors from text.
    • apiKey: Authentication for the embedding API.
  • simpleRAG(source, model, options)
    • source: Path(s) or URL(s) of content to index.
    • model: Chat model used to produce the final answer after retrieval.
    • options: Extra configuration, including vectorStore and embeddingModel.
  • ingest(): Starts indexing from the configured source (no arguments).
  • ask("…"): Accepts one string—the user’s question—for a single-shot answer.
Example
<cfscript>
    chatModel = chatmodel({ 
        provider: "openai", 
        modelName: "gpt-4o-mini", 
        apiKey: #application.apikey#, 
        temperature: 0.7 
    }); 

    docsDir = expandPath("./docs/");

    // Enterprise configuration with custom vector store and embedding model
    vectorStore = vectorStore({
        provider: "inMemory"
    });

    embeddingModel= {
        "provider": "openai",
        "modelName": "text-embedding-3-small",
        "apiKey": "#application.apiKey#"
    }

    // Create RAG service: source path + model 

    ragBot = simpleRAG(
        docsDir,chatModel,{
            "vectorStore": vectorStore,
            "embeddingModel": embeddingModel
        }
        
    );     

    // Ingest documents from the docs folder
    ragBot.ingest();

    // Ask a question 

    answer = ragBot.ask("How to extend Adobe subscription using prepaid card?"); 
    writeDump(answer.message); 
</cfscript>
Output
To extend your Adobe subscription using a prepaid card, follow these steps: 1. Find the redemption code beneath the scratch-off foil on the back of the prepaid card. 2. Select the appropriate link based on your product type (for example, Creative Cloud Pro, Acrobat, etc.). 3. Sign in using your Adobe ID and password. 4. Follow the onscreen instructions to complete the redemption process.

Single-turn queries with ask()

ask() is for single-turn queries: one question, one answer, no conversation memory. Use it when each query is independent and does not rely on prior context.
<cfscript>
    chatModel = chatModel({
        provider: "openai",
        modelName: "gpt-4o-mini",
        apiKey: application.apikey,
        temperature: 0.4
    });

    docsDir = expandPath("./docs/");
       
    // Create RAG service with the absolute docs path
    ragBot = simpleRAG(docsDir, chatModel,
    {
        minscore: 0.7,
        maxResults: 4
    });

    // Ingest documents from the docs folder
    ragBot.ingest();

    // Ask a question
    answer = ragBot.ask("How to extend Adobe subscription?");
    writeDump(answer.message);
</cfscript>

Multi-turn conversations with chat()

chat() maintains conversation context across multiple turns. Use it when follow-up questions depend on earlier answers.
Example scenario
You’re building a RAG bot: documents under ./docs/ are ingested into a vector store, and the app answers questions using retrieval + an LLM. chat() is used so later questions can refer to earlier turns (e.g. “What about Enterprise?” after a first answer). CHATMEMORY keeps a sliding window of recent chat so the model sees that history.
ChatModel({ ... })
This is your completion model (the thing that writes the final answer). You pass provider, model name, API key, and temperature (how random/creative vs deterministic). Same idea as configuring any other AI chat model in CF, just stored in chatModel for the RAG API.
expandPath("./docs/")
Resolves ./docs/ relative to the current template to a real server path. That folder is the corpus: whatever files RAG is allowed to read and index.
VectorStore({ ... })
The embedding store: chunks of text become vectors and get stored here for similarity search. INMEMORY means it lives in process memory (simple, but not durable across restarts).
embeddingModel nested inside is the model used to turn text into vectors. It must be consistent between ingestion (indexing) and query (searching), or dimensions/providers won’t line up.
simpleRAG(docsDir, chatModel, { ... }) simpleRAG is the high-level API: “here’s where the docs are, here’s the chat model, here’s options.”
  • First arg: source path(s) to ingest from.
  • Second arg: the LLM used to generate answers after retrieval.
  • Third arg: extras- in this case vectorStore (where to put embeddings) and CHATMEMORY.
CHATMEMORY: { type: "messageWindowChatMemory", maxMessages: 20 }
This is what makes chat() feel like a conversation: messageWindowChatMemory keeps the last N back-and-forth messages (user + assistant), up to maxMessages.
Without something like this, each chat() call might behave more like an isolated turn (depending on implementation), so vague follow-ups (“summarize that”) wouldn’t have context.
ragBot.ingest()
Walks the source path, parses files, splits into chunks, embeds them, and writes vectors into vectorStore. You typically do this after deploy or when docs change. Until ingest has run (or completed), there’s nothing useful to retrieve.
ragBot.chat("...") three times
Each call is one user turn. Internally, something like: optional query rewrite → embed question → vector search → build prompt with retrieved chunks + recent chat → LLM → answer.
  • r1: first question (e.g. how to open a support case).
  • r2: follow-up that assumes r1 was understood (“what do I need before I start?”).
  • r3: meta follow-up (“summarize the steps you just described”).
<cfscript>
    chatModel = ChatModel({
        provider: "openai",
        modelName: "gpt-4o-mini",
        apiKey: application.apiKey,
        temperature: 0.7
    });

    docsDir = expandPath("./docs/");

    vectorStore = VectorStore({
        provider: "INMEMORY",
        embeddingModel: {
            provider: "openai",
            modelName: "text-embedding-3-small",
            apiKey: application.apiKey
        }
    });

    // Optional: keep last N turns so follow-ups ("What about Enterprise?") resolve in context
    ragBot = simpleRAG(docsDir, chatModel, {
        vectorStore: vectorStore,
        CHATMEMORY: {
            type: "messageWindowChatMemory",
            maxMessages: 20
        }
    });

    ragBot.ingest();

    // Multi-turn: same service, sequential chat() calls
    r1 = ragBot.chat("How do I open an Adobe support case?");
    r2 = ragBot.chat("What information do I need before I start?");
    r3 = ragBot.chat("Summarize the steps you just described in three bullets.");

    writeOutput(r3.message);
</cfscript>
Output
{ "aiMessage": { "text": [ "Sign in to your Adobe account.", "Navigate to the Support history page and select the case you want to modify.", "Update the case by leaving a message or uploading files up to 10 MB." ].join("\n"), "thinking": "", "toolExecutionRequests": [], "attributes": {} }, "metadata": { "id": "4", "modelName": "ChatGPT", "tokenUsage": { "inputTokenCount": 27, "outputTokenCount": 130, "totalTokenCount": 157 }, "finishReason": "STOP" } }

Chat memory (messageWindowChatMemory)

Chat memory is what makes chat() feel like a conversation. Without it, each chat() call behaves as an isolated turn and vague follow-ups such as "summarize that" would have no context.
messageWindowChatMemory keeps the last N back-and-forth messages (user + assistant) up to maxMessages. Configure it in the third argument to simpleRAG():
ragBot = simpleRAG(docsDir, chatModel, {
  vectorStore: vectorStore,
  CHATMEMORY: {
    type: "messageWindowChatMemory",
    maxMessages: 20
  }
});
Parameter
Description
type
Set to "messageWindowChatMemory" to enable sliding-window memory.
maxMessages
Maximum number of recent messages (user + assistant turns) to keep in context.

Check index Status with getStatistics()

Because indexing is asynchronous, you can query the status of the index at any time. getStatistics() is a method on the RAG service object returned from simpleRAG(). It returns a struct describing the last ingestion run.
You normally call it after ingest() (or after an async ingest finishes) when you want counts, timing, and status in one place.
<cfscript>
    chatModel = ChatModel({
        provider: "openai",
        modelName: "gpt-4o-mini",
        apiKey: application.apiKey,
        temperature: 0.2
    });

    docsDir = expandPath("./docs/");

    vectorStore = VectorStore({
        provider: "INMEMORY",
        embeddingModel: {
            provider: "openai",
            modelName: "text-embedding-3-small",
            apiKey: application.apiKey
        }
    });

    ragBot = simpleRAG(docsDir, chatModel, {
        vectorStore: vectorStore
    });

    // Index documents, then inspect the store or pipeline stats
    ragBot.ingest();

    stats = ragBot.getStatistics();
    writeDump(stats);

// Example: log one line if your build exposes known keys (adjust names to match getStats() output)
// writeLog(type="Information", file="rag", text="RAG stats: " & serializeJSON(stats));
</cfscript>
The returned struct contains the following fields:
Field
Description
avgSegmentsPerDocument
Average number of text segments (chunks) produced per loaded document for this ingestion run.
documentsLoaded
Count of source documents that were loaded and processed in this run.
initialized
Whether the RAG service (or its core components) finished initialization successfully.
pipelineBuilt
Whether the configured ingestion or retrieval pipeline was built without a fatal configuration error.
segmentsCreated
Total text segments created by splitting loaded documents before or independent of final store success.
segmentsFailed
Segments that could not be fully processed through the pipeline (e.g. embed or persist failures).
segmentsIngested
Segments successfully written into the embedding or vector store path for this run.
status
Overall outcome of the ingestion operation (e.g. "completed" when the job finished successfully).
timestamp
Time associated with these statistics, often epoch milliseconds when the stats snapshot was recorded.
totalDurationMs
Elapsed time for the operation in milliseconds.
totalTimeMs
Another elapsed time in milliseconds; may mirror totalDurationMs or reflect a slightly different measurement boundary.
totalTimeSec
Same or equivalent duration expressed in seconds for readability.

Async ingestion with ingestAsync() and Futures

Ingestion reads your files, splits them into chunks, creates embeddings, and stores them in a vector store so questions can be answered from your content.
By default, synchronous ingestion (ingest()) keeps the request busy until indexing finishes. Asynchronous ingestion starts that work and returns a Future object right away. You can wait for completion when you are ready, or structure your code so other logic runs while ingestion proceeds.
Use async when:
  • Ingestion may take noticeable time (large folders, many files).
  • You want a clear place in code to wait (get()) or to check progress.
  • You are building flows where you must not assume ingest finished until you handle the Future.
Basic pattern
  1. Create your chat model, vector store, and RAG service (simpleRAG() or agent()), same as for synchronous RAG.
  2. Call ingestAsync() (name may vary slightly by release; check your dictionary) on that service. It returns a Future.
  3. When you need the result, call get() on the Future. That waits until ingestion completes and returns the ingest result struct (fields such as documents loaded or segments ingested depend on your build).
  4. Only then call ask(), chat(), or rely on retrieval, if you need a fully indexed store.
Important: Until get() completes successfully, you should treat the knowledge base as not ready for queries that depend on the new ingest.
Example: ingest asynchronously, then query
The following example shows the logical flow. Adjust struct keys (`CHATMODEL`, `ingestion`, and so on) to match your ColdFusion RAG API.
<cfscript>
    chatModel = ChatModel({
        provider: "openai",
        modelName: "gpt-4o-mini",
        apiKey: application.apiKey,
        temperature: 0.2
    });

    docsDir = expandPath("./docs/");
    vectorStore = VectorStore({
        provider: "INMEMORY",
        embeddingModel: {
            provider: "openai",
            modelName: "text-embedding-3-small",
            apiKey: application.apiKey
        }
    });

    ragBot = simpleRAG(docsDir, chatModel, { vectorStore: vectorStore });

    // Non-blocking: returns immediately with a Future
    future = ragBot.ingestAsync();

    // Option A: block until indexing finishes
    result = future.get();

    // Option B: poll while doing other work (if your Future supports it)
    /*
    while (!future.isDone()) {
        // small unit of other work, or sleep
    }
    result = future.get();
    */

    //Optional: inspect statistics after ingest
    stats = ragBot.getStatistics();
    writeDump(stats);

    // Safe to query after get() returns successfully
    answer = ragBot.ask("How to upgrade Adobe plan?");
    writeOutput(answer.message);
</cfscript>
Output
To upgrade your Adobe plan to the Creative Cloud Pro plan, follow these steps: Sign in to the Creative Cloud Pro page, select 'Buy now', and check all the apps and services included in the plan. If you are currently using a single app plan, you can select 'Switch plans' to proceed with the upgrade. Follow the onscreen instructions to complete the upgrade. Once done, you will have successfully upgraded to Creative Cloud Pro, giving you access to over 20 creative apps, including Photoshop, Illustrator, and Premiere Pro, along with additional features like Generative AI capabilities and cloud storage.
If your code uses simpleRAG() instead of agent(), the same idea applies: call ingestAsync() on the returned service if your build exposes it, then get() on the Future before ask() or chat().
Understanding the Future
  • ingestAsync() returns quickly with a Future handle.
  • future.get() blocks the current request until ingestion finishes. Your page or request thread still waits at that line; it does not return HTML to the browser before that unless you structure the page differently.
  • For true background jobs that survive the HTTP response (user navigates away while indexing continues), you typically need application design beyond this pattern (scheduled tasks, message queues, or long-lived workers). The Future pattern is ideal when you want non-blocking composition in code or clear completion before the next step in the same request.

Simple RAG configuration options

Pass a third argument to simpleRAG() to tune how documents are split and how retrieval works. The following options are supported:
Option
Default
Description
chunkSize
1000
Target maximum size of each chunk in characters. Larger chunks preserve more context but reduce retrieval precision; smaller chunks improve granularity but increase vector count and cost.
chunkOverlap
200
Number of characters shared between adjacent chunks. Helps avoid cutting sentences or facts in half at boundaries.
splitterType
recursive
Algorithm used to split text into chunks. Options: recursive, sentence, paragraph, line, word, character, regex. Pass recursive: false alongside splitterType to disable recursive cascade.
minScore
Minimum similarity score (0–1). Chunks below this threshold are excluded from retrieval results.
maxResults
5
Maximum number of chunks returned per query (top-K). Covers most question types without exceeding context limits.
vectorStore
inMemory
The vector store to use. Defaults to an in-memory store (lost on restart). Pass a VectorStore object for persistent production storage.
embeddingModel
OpenAI ada-002
The embedding model used to convert text to vectors. Must be consistent between ingestion and query.
CHATMEMORY
Chat memory configuration for multi-turn conversations. Set type to messageWindowChatMemory and provide maxMessages.
Example: configuring chunking options
<cfscript>
  chatModel = ChatModel({
    provider: "openai",
    modelName: "gpt-4o-mini",
    apiKey: application.apiKey,
    temperature: 0.7
  });
  vectorStore = VectorStore({
    provider: "INMEMORY",
    embeddingModel: {
      provider: "openai",
      modelName: "text-embedding-3-small",
      apiKey: application.apiKey
    }
  });
  docsDir = expandPath("./docs/");
  ragService = simpleRAG(
    docsDir,
    chatModel,
    {
      vectorStore: vectorStore,
      chunkSize: 500,
      chunkOverlap: 50,
      splitterType: "character",
      recursive: false
    }
  );
  ragService.ingest();
  answer = ragService.ask("How to renew Adobe subscription?");
  writeOutput(answer.message);
</cfscript>
Output
To renew or extend your Adobe subscription, it will automatically renew if your payment information is up to date, ensuring uninterrupted access to your creative tools. You can check and update your payment information on the Plans page in your Adobe account. If you received a payment failure notification when trying to renew, make sure you've authenticated your card details and complied with the bank's requirements. If your subscription fails to renew due to a payment failure, you may manually reactivate your inactive subscription by following the necessary steps in your Adobe account.

Default behavior and intelligent defaults

When you call simpleRAG() with no options struct, ColdFusion makes the following choices automatically. These defaults are designed to work well for most prose documents and development use cases.
Component
Default choice
Why this default
Document splitter
RecursiveCharacterTextSplitter, 1000 chars, 200 overlap
Balances chunk granularity with context retention. Works well for most prose documents.
Embedding model
OpenAI text-embedding-ada-002
Cost-effective, high quality, and compatible with most vector stores.
Vector store
In-memory store
Zero configuration needed. Data is lost when the application restarts — suitable for development only.
Content retriever
EmbeddingStoreContentRetriever, maxResults: 5
Returns the 5 most relevant chunks per query. Covers most question types without exceeding context limits.
Warning:
The default in-memory vector store is not suitable for production. Documents must be re-indexed on every application restart. For production, configure a persistent vector store (Milvus, Qdrant, Chroma, or Pinecone).
These intelligent defaults follow a zero-configuration philosophy: sensible decisions are made by the system unless the developer chooses to override them. This means a developer can move from idea to working implementation with minimal setup, without needing to understand low-level concepts such as embeddings, vector databases, and orchestration pipelines.

Share this page

Was this page helpful?
We're glad. Tell us how this page helped.
We're sorry. Can you tell us what didn't work for you?
Thank you for your feedback. Your response will help improve this page.

On this page