Whatever message this page gives is out now! Go check it out!
agent(), simpleRAG()), or outside it using documentService(). The standalone service is useful when you need explicit control over each step, load once, split with different strategies, transform with your own UDFs, then ingest to Milvus, Qdrant, or another store, without coupling everything to a single ingestion block.documentService() returns an object you call like a small SDK: load, split, transform, transformSegments, ingest, plus loadAsync, ingestAsync, transformAsync, transformSegmentsAsync, and close. Parameters are passed per method; the factory itself does not take a global config struct.agent({ ingestion: { ... } }) bundles document loading, splitting, and vector ingest into one lifecycle: you configure source, documentSplitter, vectorStoreIngestor, and call ingest() on the agent. documentService() exposes the same underlying stages as discrete calls so you can compose custom workflows (for example load from disk, split, enrich metadata in CFML, then ingest), integrate with batch jobs, or unit-test splitting and transforms without standing up a full agent.agent() ingestion when you want a declarative RAG setup; use documentService() when you need procedural control or reuse of intermediate arrays (documents, segments).Requirement | agent() ingestion | documentService() |
|---|---|---|
Declarative RAG setup, minimal code | Preferred | Works but verbose |
Procedural control over each step | No | Discrete method calls |
Reuse intermediate arrays (documents, segments) | No | Each method returns an array |
Custom UDF transforms at document or segment level | No | transform() / transformSegments() |
Unit-test splitting and transforms in isolation | No | No agent required |
Batch jobs or offline export (no vector store required) | No | Stop after split |
Query retrieval from same service | chat() / ask() | No — hand segments to agent() |
documentService() and no arguments. Each call returns a new instance; tests confirm multiple instances operate independently.// No factory-level config — options are passed to each method
docService = documentService();
documents = docService.load({
path: expandPath("./Documents/"),
pattern: "*.txt"
});
// Multiple independent instances
service1 = documentService();
service2 = documentService();
docs1 = service1.load({ path: expandPath("./Documents/"), pattern: "*.txt" });
docs2 = service2.load({ path: expandPath("./Documents/"), pattern: "*.txt" });load → split → transformSegments → ingest. A lighter pattern stops after load → split when you only need chunks for offline export, scoring, or non-vector workflows.load() reads from the filesystem and from URLs. It returns a ColdFusion array of structs; each document struct includes at least text and metadata. You can pass a struct with path and optional pattern, recursive, and metadata (merged into each document's metadata), or pass a string path directly.<cfscript>
docService = documentService();
documents = docService.load({
path: expandPath("./docs/")
});
if (isArray(documents) && arrayLen(documents) > 0) {
doc = documents[1];
// doc.text, doc.metadata
}
writeDump(doc);
</cfscript><cfscript>
docService = documentService();
documents = docService.load({
path: expandPath("./docs/"),
pattern: "*.pdf",
recursive: false,
metadata: { category: "test" }
});
if (isArray(documents) && arrayLen(documents) > 0) {
doc = documents[1];
// doc.text, doc.metadata
}
writeDump(doc);
</cfscript>split() takes the array of documents from load() and returns an array of segment structs (each with text and metadata). You can rely on defaults (chunkSize: 1000, chunkOverlap: 100) or pass chunkSize, chunkOverlap, and splitterType. For splitterType: "regex", supply regexPattern.<cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents);
writeDump(segments);
</cfscript><cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, {
chunkSize: 500,
chunkOverlap: 50
});
writeDump(segments);
</cfscript><cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, {
chunkSize: 500,
chunkOverlap: 50,
splitterType: "recursive"
});
writeDump(segments);
</cfscript><cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, {
chunkSize: 500,
chunkOverlap: 50,
splitterType: "regex",
regexPattern: "\n\n"
});
writeDump(segments);
</cfscript>load() or segments after split() — normalizing text, enriching metadata, or tagging pipeline stages, before ingest. transform(documents, udf) applies a function to each document. transformSegments(segments, udf) passes both the source document and segment to the UDF so you can correlate chunk-level data with file-level context.<cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
function myTransformer(required struct document) {
document.metadata.transformed = true;
document.metadata.wordCount = listLen(document.text, " ");
return document;
}
transformed = docService.transform(documents, myTransformer);
writeDump(transformed);
</cfscript><cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, { chunkSize: 500, chunkOverlap: 50 });
function segmentEnricher(struct document, required struct segment) {
segment.metadata.enhanced = true;
segment.metadata.charCount = len(segment.text);
return segment;
}
transformed = docService.transformSegments(segments, segmentEnricher);
writeDump(transformed);
</cfscript>ingest() takes an array of segments and a vector store client, embeds segment text according to the store's embeddingModel, and writes vectors. You may call ingest(segments, store) with defaults, or pass a third struct with batchSize and continueOnError. The return value is a struct of statistics.<cfscript>
docService = documentService();
vectorStore = VectorStore({
provider: "INMEMORY",
embeddingModel: {
provider: "openai",
modelName: "text-embedding-3-small",
apiKey: application.apiKey
}
});
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, { chunkSize: 500, chunkOverlap: 50 });
result = docService.ingest(segments, vectorStore, {
batchSize: 50,
continueOnError: true
});
writeDump(result);
</cfscript>vectorStoreClient = vectorstore({
provider: "qdrant",
url: application.vectorDB.qdrant.grpcUrl,
apiKey: application.vectorDB.qdrant.apiKey,
collectionName: "dps_ingest_qdrant_test",
metricType: "COSINE",
dimension: 384,
embeddingModel: {
provider: "ollama",
modelName: "all-minilm",
baseUrl: application.ollamaBaseUrl
}
});
result = docService.ingest(segments, vectorStoreClient, {
batchSize: 50,
continueOnError: true
});load → split when you only need chunks for offline export, scoring, or non-vector workflows:<cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, {
chunkSize: 1000,
chunkOverlap: 100
});
</cfscript><cfscript>
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, { chunkSize: 500, chunkOverlap: 50 });
function enrichSegment(struct document, required struct segment) {
segment.metadata.pipeline = "full";
segment.metadata.processedAt = now();
return segment;
}
enrichedSegments = docService.transformSegments(segments, enrichSegment);
vectorStore = VectorStore({
provider: "INMEMORY",
embeddingModel: {
provider: "openai",
modelName: "text-embedding-3-small",
apiKey: application.apiKey
}
});
result = docService.ingest(enrichedSegments, vectorStore);
writeDump(result);
</cfscript>.get() to block until the operation completes. This mirrors loadAsync, transformAsync, transformSegmentsAsync, and ingestAsync patterns used when work can overlap with other request processing (subject to server threading and safety limits).Async method | Sync equivalent | Returns (after .get()) |
|---|---|---|
loadAsync(options) | load(options) | Array of document structs |
transformAsync(docs, udf) | transform(docs, udf) | Array of transformed document structs |
transformSegmentsAsync(segs, udf) | transformSegments(segs, udf) | Array of transformed segment structs |
ingestAsync(segs, store) | ingest(segs, store) | Statistics struct |
future = docService.loadAsync({
path: expandPath("./Documents/"),
pattern: "*.txt"
});
documents = future.get();function asyncTransformer(required struct document) {
document.metadata.asyncDone = true;
return document;
}
future = docService.transformAsync(documents, asyncTransformer);
transformed = future.get();future = docService.transformSegmentsAsync(segments, function(struct document, struct segment) {
segment.metadata.asyncProcessed = true;
return segment;
});
transformed = future.get();future = docService.ingestAsync(segments, vectorStoreClient);
result = future.get();future.get() blocks the current request thread until the operation finishes. For true background jobs that survive the HTTP response, use scheduled tasks, message queues, or long-lived workers.documentService() is the standalone equivalent of the ingestion stages inside agent(). You can compose them: use documentService() to load, split, and transform documents with full procedural control, then ingest the resulting segments into a VectorStore that is also referenced by an agent() for retrieval. Both operate on the same underlying index.VectorStore instance. Use documentService() to populate it, then pass the same store to agent() for querying:<cfscript>
// 1. Create a shared vector store
sharedStore = VectorStore({
provider: "INMEMORY",
embeddingModel: {
provider: "openai",
modelName: "text-embedding-3-small",
apiKey: application.apiKey
}
});
// 2. Use documentService() to load, split, enrich, and ingest
docService = documentService();
documents = docService.load({ path: expandPath("./docs/"), pattern: "*.txt" });
segments = docService.split(documents, { chunkSize: 500, chunkOverlap: 50 });
function enrichSegment(struct document, required struct segment) {
segment.metadata.source = "documentService";
return segment;
}
enriched = docService.transformSegments(segments, enrichSegment);
docService.ingest(enriched, sharedStore);
docService.close();
// 3. Use agent() for retrieval against the same store
chatModel = ChatModel({
provider: "openai",
modelName: "gpt-4o-mini",
apiKey: application.apiKey,
temperature: 0.7
});
queryService = agent({
CHATMODEL: chatModel,
retrievalAugmentor: {
queryRouter: {
contentRetrievers: [{
vectorStore: sharedStore,
maxResults: 5,
minScore: 0.3,
description: "Knowledge base"
}]
}
}
});
answer = queryService.chat("How to upgrade Adobe plan?");
writeOutput(answer.message);
</cfscript>close() when you are done to release any resources held by the instance.<cfscript>
docService = documentService();
documents = docService.load({
path: expandPath("./Documents/"),
pattern: "*.txt"
});
docService.close();
</cfscript>close() after the final ingest() call on a given instance — or in a finally block if the pipeline can throw. Multiple instances each require their own close() call; closing one instance does not affect others.Topic | Detail |
|---|---|
close() | Releases resources held by this documentService() instance. Safe to call after load(), split(), transform(), or ingest(). |
Scope | Each instance must be closed independently. Closing service1 does not affect service2. |
Timing | Call after the final ingest() on that instance, or in a finally block to guarantee cleanup even if an earlier step throws. |
documentService() itself has no persistent state between calls. Only the instance object returned by a given call holds resources. Each new call to documentService() creates a fresh, independent instance.