Whatever message this page gives is out now! Go check it out!
Requirement | Minimum | Notes |
ColdFusion version | CF2025.0.08 | RAG is a new feature in CF2025.0.08; not available in earlier releases |
Java / JVM | JDK 11+ | Libraries require Java 11 or later |
Operating System | Windows, Linux, macOS | All CF-supported platforms are supported |
AI provider API key | Required | At minimum one chat model key (OpenAI, Anthropic, Azure OpenAI, etc.) |
Field | Description |
Vector store | Required. The target vector store profile where ingested chunks (and embeddings) are stored. Choose a store that matches your RAG query path and embedding dimension. If the dropdown is empty, create and save a vector store configuration first. |
Field | Description |
Source type |
|
File path (or path / URL field) | Path or address for the selected source type. For single file or directory, use an absolute path the server can read. Use Browse server when available to reduce path typos. For URL, enter a full URL per your integration’s requirements. |
Supported formats | Typical support includes PDF, Word, Excel, PowerPoint, HTML, CSV, JSON, XML, plain text, Markdown, and related formatsUnsupported files may be skipped or fail per Continue on error. |
Field | Description |
Parser type | Format-specific parser (for example PDF, HTML, plain text). Choose the parser that matches the dominant file type in this run, or the type your product uses when a single parser is selected for a batch. |
Character encoding | Text encoding for parsers that read byte streams (for example UTF-8). Use the encoding that matches your files to avoid mojibake or parse failures. |
Max file size (bytes) | Upper bound on file size for ingestion. 0 often means no limit or use product default. Non-zero values reject or skip oversized files early. |
Control | Description |
Run ingestion | Starts the ingestion job with the current settings. Ensure vector store and paths are correct before running; large directories can take a long time. |
Field | Description |
Splitter type | How text is split into chunks before embedding. Recursive (when labeled recommended) usually splits on paragraphs and headings first, then sentences, for more coherent chunks. Other types may split on fixed characters or delimiters only. |
Chunk size (characters) | Target maximum size of each chunk in characters (not tokens). Larger chunks preserve more context but can reduce retrieval precision; smaller chunks improve granularity but increase vector count and cost. Default 1000 is a common starting point. |
Chunk overlap (characters) | Number of characters shared between adjacent chunks. Overlap helps avoid cutting sentences or facts in half at boundaries. Default 200 is typical with 1000-character chunks; adjust if answers miss context at edges. |
Custom separators (optional) | Extra delimiter strings (if your product supports them) that force splits—for example specific headings or markers. Leave empty to use the splitter’s built-in rules. |
Field | Description |
Batch size | How many chunks or documents to process per internal batch (for example 100). Higher values can improve throughput but increase memory use. |
Continue on error | When enabled, ingestion skips or logs failed files or chunks and continues with the rest. When disabled, the job may stop on the first error, better for strict validation; worse for large mixed folders. |
component {
this.name = "chatmodelapp";
this.apikey = "api-key";
this.pineconeApiKey="api-key-pinecone"
this.anthropicKey="api-key-anthropic"
this.mappings["/tool"] = expandPath("./tool");
boolean function onApplicationStart() {
application.apiKey = this.apikey;
application.anthropicKey = this.anthropicKey;
writeLog(
text = "Application started. API key initialized.",
file = "application"
);
return true;
}
void function onRequestStart(string targetPage) {
/* ---- Application Re-initialization ---- */
if (
structKeyExists(url, "reinit")
&& url.reinit eq 1
/* add protection as needed */
) {
writeLog(
text = "Application reinitialization triggered.",
file = "application"
);
applicationStop();
}
}
}{
"comments": "paths should be semi-colon seperated. To Allow a file: {path-of-file}; To Allow a directory & files in it: {path-to-directory}/*; To Allow a directory & sub-directories: {path-to-directory}/**; To Block a file: !{path-of-file}; To Block a directory & sub-directories: !{path-to-directory}/**; Precedence decreases from left to right. Suppose directory A has directory B & C inside it.To Allow B & Block C: !A/C/*;A/**;",
"bytecodeexecutionpaths": "",
"documentaccesspaths": "C:/**;E:/**;",
"schedulerexecutionpaths": "",
"car": {
"deploypath": "",
"associatedfiles": ""
}
}Extensions | Parser used | Notes |
.txt, .text | Text Parser | Plain UTF-8 text |
.md, .markdown | Markdown Parser | Strips Markdown syntax; headings become metadata |
.pdf | PDF Parser | Extracts text layer; scanned-only PDFs may return empty content |
.doc, .docx, .xls, .xlsx, .ppt, .pptx | Apache POI Parser | Full Office format support including embedded text in tables |
.odt, .ods, .odp, .rtf, .html, .htm, .xml, .eml, .msg, .epub | Apache Tika Parser | Broad format coverage via Tika |
.csv | CSV Parser | Each row becomes a document chunk |
.json | JSON Parser | Extracts string values; nested objects are flattened |
.xml | XML Parser | Text content of elements; attributes are included as metadata |
.atom, .rss | Feed Parser | Each feed item becomes a document |
.properties, .props | Properties Parser | Key-value pairs |
.log | Log Parser | Each log line or entry becomes a chunk |
.zip, .jar, .war, .tar, .gz and variants | ZIP Document Parser | Recursively unpacks and parses contained files |
Any other type | Custom Parser (UDF) | Provide your own parsing logic via a UDF — see section 5.3 |